Chapter 5: Introducing Delta Engine
Delta Engine is the query engine of Delta Lake, which is included by default in Azure Databricks. It is built in a way that allows us to optimize the processing of data in our Delta Lake in a variety of ways, thanks to optimized layouts and improved data indexing. These optimization operations include the use of dynamic file pruning (DFP), Z-Ordering, Auto Compaction, ad hoc processing, and more. The added benefit of these optimization operations is that several of these operations take place in an automatic manner, just by using Delta Lake. You will be using Delta Engine optimization in many ways.
In this chapter, you will learn how to make use of Delta Lake to optimize your Delta Lake ETL in Azure Databricks. Here are the topics on which we will center our discussion:
- Optimizing file management with Delta Engine
- Optimizing queries using DFP
- Using Bloom filters
- Optimizing join performance
Delta Engine is all about optimization...