Summary
In this chapter, we looked at different ways of implementing data observability in our data pipelines by weighing the pros and cons of each method.
The first method we considered relies on analyzing the data source, be it synchronously or asynchronously, with the application’s execution modifying the data. While this approach seems easy to implement, maintaining an external application can be an issue, and the observability metrics that are gathered may be misleading as they do not obey the three data observability principles – contextual, synchronous, and continuous validation.
A variant of this technique is to start from what the application can tell you about its data transformation by analyzing logs produced during the execution and replaying the run to collect metrics. Again, this method is not fully in line with data observability and introduces complexity outside the original application.
The method that covers the most data observability principles...