Summary
In this chapter, we saw why data quality is important. Data quality allows us to prevent and solve issues in data processes. We explored the dimensions of data quality and what measures can be taken.
Next, we analyzed the data maturity path that companies started on years ago and are still taking and how this path is bringing about the urgent need to have an ever-greater focus on data quality.
We also defined producer-consumer information bias, leading to a shift in responsibilities for data pipeline stakeholders. To solve this, we proposed using the service-level method.
First, data quality must be considered as a service-level agreement, which is a contract between the producer and the consumer. These contracts contain the expected level of quality the data users require.
Second, the agreements are processed by the data producers, who will create a set of objectives that aim to support one or several agreements.
Third, to ensure that the objectives are met, the producer must set up indicators to reflect the state of the data.
Finally, the indicators are used to detect quality issues by creating rules that can trigger actions on the side of the data producer through alerts. The validity of those rules can be used to create a scorecard, which will solve the information bias problem by ensuring everyone is well informed about the objectives and the way they are controlled.
In the next chapter, we will see why those indicators are the backbone of data observability and how data quality can be turned into data observability.