In this part, we will explore the essentials of data operations with Apache Spark and Delta Lake, covering data ingestion, extraction, transformation, and manipulation to align with business analytics. We will delve into Delta Lake for reliable data management with ACID transactions and versioning, and tackle streaming data ingestion and processing for real-time insights. This part concludes with performance tuning strategies for both Apache Spark and Delta Lake, ensuring efficient data processing within the Lakehouse architecture.
This part contains the following chapters:
- Chapter 1, Data Ingestion and Data Extraction with Apache Spark
- Chapter 2, Data Transformation and Data Manipulation with Apache Spark
- Chapter 3, Data Management with Delta Lake
- Chapter 4, Ingesting Streaming Data
- Chapter 5, Processing Streaming Data
- Chapter 6, Performance Tuning with Apache Spark
- Chapter 7, Performance Tuning in Delta Lake