Chapter 5: Architecting Data Engineering Pipelines
Having gained an understanding of data engineering principles, the core concepts, and the available AWS tools, we can now put these together in the form of a data pipeline. A data pipeline is the process that ingests data from multiple sources, optimizes and transforms the data, and makes it available to data consumers. An important function of the data engineering role is the ability to design, or architect, these pipelines.
In this chapter, we will cover the following topics:
- Approaching the task of architecting a data pipeline
- Identifying data consumers and understanding their requirements
- Identifying data sources and ingesting data
- Identifying data transformations and optimizations
- Loading data into data marts
- Wrapping up the whiteboarding session
- Hands-on – architecting a sample pipeline