Optimizing Data Pipelines
The importance of data in companies has significantly increased the investments in data platforms by companies. Over time, this has increased companies’ priority of being aware of what their data pipelines do and how they do it and therefore monitoring not only the quality of the outcomes but also the state of health of the pipelines. At the same time, they are also monitoring the usage of the resources and tracking the associated costs.
In this chapter, we will understand how data observability offers us a way to make the governance of our data pipelines scalable and sustainable. First, we will focus on understanding the key data pipelines, their main components, and the types of data pipelines, as well as their characteristics. Then, we will learn how data observability and, in particular, data lineage can be used to manage several aspects of the data pipeline life cycle, such as the costs and the risks.
In this chapter, we’ll cover the...