Chapter 3: Ingesting and Processing Data in a Data Lakehouse
In the previous chapter, we provided an overview of the architectural components of a data lakehouse. That chapter provided a bird's-eye view of the seven layers and described these layers in considerable detail. This chapter will cover the architectural patterns for the first two layers of a data lakehouse:
- The data ingestion layer
- The data processing layer
These two layers need to be covered together as they are interlinked. Data is relayed from the ingestion layer to the processing layer. Many of the tools and technologies that are used in both these layers are the same.
This chapter is divided into five sections. We will start by exploring the differences between the extract, transform, load (ETL) and extract, load, transform (ELT) data transformation patterns. Then, we will dive deeper into the methods for ingesting and processing batch data. After that, we will do the same for streaming data...