Using Auto Optimize
Auto Optimize is a feature that helps us automatically compact small files while an individual writes to a delta table. Unlike bin-packing, we do not need to run a command every time Auto Optimize is executed. It consists of two components:
- Optimized Writes: Databricks dynamically optimizes Spark partition sizes to write 128 MB chunks of table partitions.
- Auto Compaction: Here, Databricks runs an optimized job when the data writing process has been completed and compacts small files. It tries to coalesce small files into 128 MB files. This works on data that has the greatest number of small files.
Next, we will learn about the Spark configurations for Auto Optimize and go through a worked-out example to understand how it works:
- To enable Auto Optimize for all new tables, we need to run the following Spark configuration code:
%sql set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true; set spark.databricks.delta...