Checking data quality
As mentioned in Understanding the medallion architecture section, it is now time to think about the rules and constraints that we would like our data to adhere to. The rules are usually defined by the business team but it is not uncommon for the development team to define those constraints themselves. For our purposes, we want our Bronze layer data to adhere to the following rules:
- At least 50% of the records should be of the
Download Content
,Event Registration
, orSurvey Response
conversion event types - At least 90% of the records must have product group specified
- Country name must be present for all records
We will write to the Silver layer only if all of the preceding checks pass.
For this example, we will create DeequChecks
class that has a runIfSuccess
method. This method takes a body of code that will be executed only if the constraints defined in the caller evaluate to true
:
package com.packt.dewithscala.chapter12 import com...