Part 5 – End-to-End Data Pipelines
In this part, Chapter 12 utilizes your previously acquired skills to create a batch pipeline, highlighting batch processing’s importance in data engineering. The topics covered include a typical business use case, ingestion, transformation, quality checks, and orchestration.
Chapter 13 constructs a streaming pipeline, emphasizing real-time data ingestion via Azure Event Hubs, configured as Apache Kafka for Spark integration. It utilizes Spark’s Structured Streaming and Scala and covers topics such as use case understanding, data ingestion, transformation, serving layer loading, and orchestration, aiming to prepare you for similar pipeline implementation in your organization.
This part has the following chapters:
- Chapter 12, Building Batch Pipelines Using Spark and Scala
- Chapter 13, Building Streaming Pipelines Using Spark and Scala