Understanding components of Deequ
Deequ provides a lot of features to make data quality checks easy. The following diagram shows the major components:
Figure 7.1 – Components of Deequ
We can observe the following components:
- Metrics computation: Deequ calculates metrics for data quality, such as completeness, maximum, and so on. You can directly access the raw metrics computed on the data.
- Constraint verification: By defining a set of data quality constraints, Deequ automatically derives the necessary metrics to be computed on the data, ensuring constraint validation.
- Constraint suggestion: You have the option to utilize Deequ’s automated constraint suggestion methods to infer valuable constraints or define your own customized data quality constraints.
In the background, Deequ uses Apache Spark for metrics computation, and thus it is fast and efficient. In the upcoming sections, we are going to cover these features...