Summary
In this chapter, we have looked at both the problems of data quality and the potential solutions to improve it.
We saw that data quality issues come from three key areas. First, from the source system, where data might be incomplete, unreliable, or inconsistent. Second, from the infrastructure and pipelines that process and transform data as it is ingested. In that case, you might have issues with data quality because the data is too late, corrupted, or missing. It might also be that a mistake is made in the transformations, where, for example, we get the granularity or precision of the data wrong. Finally, data quality issues can arise from problems with data governance, most notably from inconsistencies in the definitions and documentation or a lack of access management, cost management, or metadata management. This all leads to a misunderstanding of the data and the data’s lineage and dependencies, which, in turn, leads to suboptimal decision-making and increased...