The Modern Data Stack
In this chapter, we will explore the modern data architecture that has emerged for building scalable and flexible data platforms. Specifically, we will cover the Lambda architecture pattern and how it enables real-time data processing along with batch data analytics. You will learn about the key components of the Lambda architecture, including the batch processing layer for historical data, the speed processing layer for real-time data, and the serving layer for unified queries. We will discuss how technologies such as Apache Spark, Apache Kafka, and Apache Airflow can be used to implement these layers at scale.
By the end of the chapter, you will understand the core design principles and technology choices for building a modern data lake. You will be able to explain the benefits of the Lambda architecture over traditional data warehouse designs. Most importantly, you will have the conceptual foundation to start architecting your own modern data platform.
...