Analyzing the data
The simplest and most immediate way to monitor what is happening is to focus on monitoring your data. At the beginning of our data quality journey, it seems correct to start with the data, usually because data is exactly what our customers will be using.
But as we saw in the previous chapter, the data itself is often not sufficient to identify and prevent anomalies; for example, even if we are sure that the data we are producing is correct, it is possible that in the meantime, we are ignoring that our new data pipeline is slowly increasing the execution time as well as the usage of hardware resources. Today, we may not notice an anomaly because it is not a critical problem and data is produced as expected, but it could soon become an issue before we realize it. Indeed, what appears to be a pipeline execution issue may soon be converted into a timeliness issue.
On the other hand, our main concern is to be aware of what is happening in our data. So, what does...