Detecting anomalies
Deequ supports anomaly detection in data by using metrics stored in MetricsRepository
, which we covered in the previous section. For example, we can create a rule to check whether the number of records has increased by 50% compared to the previous run. If it has, then the check will fail.
To show you how it works, we will use a fictitious scenario where we receive a batch of products to be added to the inventory each day. We want to check whether the number of products we receive on any given day has increased by 50% compared to the last run. For this example, we will use an in-memory repository to store the metrics. As we have done earlier, let’s define the dataframes we will use in this example:
  val session = Spark.initSparkSession("de-with-scala")   import session.implicits._   val yesterdayDF = Seq((1, "Product 1", 100), (2, "Product 2", 50)).toDF(   "product_id",...