You're reading from Data Engineering with Databricks Cookbook Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781837633357

Length 438 pages

Edition 1st Edition

Tools

Apache Spark

Concepts

Data Engineering

Author (1):

Pulkit Chadha

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1 – Working with Apache Spark and Delta Lake FREE CHAPTER

2. Chapter 1: Data Ingestion and Data Extraction with Apache Spark

3. Chapter 2: Data Transformation and Data Manipulation with Apache Spark

4. Chapter 3: Data Management with Delta Lake

5. Chapter 4: Ingesting Streaming Data

6. Chapter 5: Processing Streaming Data

7. Chapter 6: Performance Tuning with Apache Spark

8. Chapter 7: Performance Tuning in Delta Lake

9. Part 2 – Data Engineering Capabilities within Databricks

10. Chapter 8: Orchestration and Scheduling Data Pipeline with Databricks Workflows

11. Chapter 9: Building Data Pipelines with Delta Live Tables

12. Chapter 10: Data Governance with Unity Catalog

13. Chapter 11: Implementing DataOps and DevOps on Databricks

14. Index

Why subscribe?

15. Other Books You May Enjoy

Writing the output of Apache Spark Structured Streaming to a sink such as Delta Lake

In this recipe, you will learn how to write the output of Apache Spark Structured Streaming to a sink such as Delta Lake. Delta Lake can serve as a unified storage layer for various data types, reducing data silos within organizations. By using Delta Lake as a sink for streaming data, you can simplify data pipelines, reducing complexity and streamlining data architecture. Delta Lake enables unified analytics, allowing you to leverage a wide range of analytics tools and frameworks within a single environment, including Apache Spark, Databricks, SQL, and machine learning (ML) libraries. This versatility makes Delta Lake a valuable choice for real-time data processing and analytics pipelines, enhancing data reliability, durability, and consistency while simplifying data management and supporting compliance requirements.