Delta table performance optimization
Delta engine is a high-performance query engine and most of the optimization is taken care of by the engine itself. However, there are some more optimization techniques that we are going to cover in this recipe.
Using Delta Lake on Azure Databricks, you can optimize the data stored in cloud storage. The two algorithms supported by Delta Lake on Databricks are bin-packing and Z-ordering:
- Compaction (bin-packing) – The speed of read queries can be optimized by compacting small files into large ones. So, whenever data is required, instead of searching for data in a large number of small files, the Spark engine efficiently reads the Delta files up to 1 GB.
- Z Order – Delta Lake on Databricks provides a technique to arrange related information in the same set of files. This helps in reducing the amount of data that needs to be read. You specify the column name
ZORDER
by clause to collocate the data. - VACUUM – You...