You're reading from Azure Data Engineering Cookbook Get well versed in various data engineering techniques in Azure using this recipe-based guide

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781803246789

Length 608 pages

Edition 2nd Edition

Languages

SQL

Tools

Azure

Concepts

Data Engineering

Authors (3):

Ahmad Osama

Nagaraj Venkatesan

Luca Zanna

View More author details

Table of Contents (16) Chapters

Preface

1. Chapter 1: Creating and Managing Data in Azure Data Lake

2. Chapter 2: Securing and Monitoring Data in Azure Data Lake FREE CHAPTER

3. Chapter 3: Building Data Ingestion Pipelines Using Azure Data Factory

4. Chapter 4: Azure Data Factory Integration Runtime

5. Chapter 5: Configuring and Securing Azure SQL Database

6. Chapter 6: Implementing High Availability and Monitoring in Azure SQL Database

7. Chapter 7: Processing Data Using Azure Databricks

8. Chapter 8: Processing Data Using Azure Synapse Analytics

9. Chapter 9: Transforming Data Using Azure Synapse Dataflows

10. Chapter 10: Building the Serving Layer in Azure Synapse SQL Pool

11. Chapter 11: Monitoring Synapse SQL and Spark Pools

12. Chapter 12: Optimizing and Maintaining Synapse SQL and Spark Pools

13. Chapter 13: Monitoring and Maintaining Azure Data Engineering Pipelines

14. Index

Why subscribe?

15. Other Books You May Enjoy

Optimizing Delta tables in a Synapse Spark pool lake database

As covered in the Processing data using Spark pools and lake databases recipe of Chapter 8, Processing Data Using Azure Synapse Analytics, a lake database allows you to store processed data in Delta tables, which are powered by Parquet files. Delta tables are very suitable for storing processed data that can be consumed by reporting solutions such as Power BI.

To achieve optimal performance in Delta tables, it is essential to evenly distribute the data among the Parquet files and purge the unwanted ones. The OPTIMIZE command helps optimally distribute the data among Parquet files, while the VACUUM command purges redundant Parquet files from the Azure Data Lake filesystem. The OPTIMIZE and VACUUM commands need to be executed regularly on the lake database so that you have optimal performance for the queries run against Delta tables.

In this recipe, we will be writing a script that can scan all Delta tables, optimize...

The rest of the chapter is locked

You're reading from Azure Data Engineering Cookbook Get well versed in various data engineering techniques in Azure using this recipe-based guide

Table of Contents (16) Chapters

Optimizing Delta tables in a Synapse Spark pool lake database

Authors (2)

Personalised recommendations for you

You're reading from Azure Data Engineering Cookbook Get well versed in various data engineering techniques in Azure using this recipe-based guide

Table of Contents (16) Chapters

Optimizing Delta tables in a Synapse Spark pool lake database

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you