What this book covers
Chapter 1, Fundamentals of Data Quality Monitoring, covers a general introduction to data quality and explains the key metrics used to measure it. It will also explain how data quality can be converted to Service Level Agreements (or contracts) to establish trust among data pipeline stakeholders.
Chapter 2, Fundamentals of Data Observability, will complete the user’s knowledge of data quality by adding the observability dimension, taking quality to the next level, and explaining how we can improve data quality monitoring to have real-time contextual information on data pipelines.
Chapter 3, Data Observability Techniques, covers how a data engineer can retrieve information from applications at run time. It will be an overview of the existing techniques and will explain their advantages and disadvantages regarding the efficient implementation of Data Observability.
Chapter 4, Data Observability Elements, provides an overview of the elements needed to collect contextual and real-time information from a pipeline. This will cover a description of those elements and showcase an example of how you can collect them within a Python script doing data manipulation.
Chapter 5, Defining Rules on Indicators, introduces the concepts of continuous validation of the data. The reader will understand how rules can be implemented by the data engineer, manually or in the code, to test the data and where such validation rules can be implemented.
Chapter 6, Root Cause Analysis, focuses on the data issues and how adopting the Data Observability approach simplifies and may even automate anomaly detection and troubleshooting. It will provide a method for Data Incident Management and anomaly detection examples.
Chapter 7, Optimizing Data Pipelines, explains how data observability can be used to manage several aspects of the data pipeline lifecycle such as the cost containment in data pipeline maintenance as well as to aim key aspects like automating documentation, managing catalog, mitigating anomalies, and reduce the change risk.
Chapter 8, Organizing Data Teams and Measuring the Success of Data Observability, focuses on how to introduce Data Observability in your team, describing the different kinds of Data Teams, the different types of organizations where these teams must fit, and how to measure the success of this initiative.
Chapter 9, Data Observability Checklist, suggests a method in the form of a checklist to implement Data Observability in the company pipelines, reviewing the common pitfalls and concerns we encountered when implementing data observability in various companies.
Chapter 10, Pathway to Data Observability, closes the book by providing data engineers with a technical roadmap to implement data observability in a first project and then at scale across the organization.