Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning
In this part, Chapter 10 delves into data pipeline orchestration, focusing on seamless task coordination and failure handling. It introduces tools such as Apache Airflow, Argo, Databricks Workflows, and Azure Data Factory. Chapter 11 highlights the Spark UI’s significance in performance optimization, covering the basics, tuning, resource optimization, and data handling techniques such as skewing, indexing, and partitioning.
This part has the following chapters:
- Chapter 10, Data Pipeline Orchestration
- Chapter 11, Performance Tuning