Built-in performance-boosting features of Delta Lake
Delta Lake provides built-in performance boosters that complement the data layout strategies that we discussed in the Optimizing the data layout section. If there is a well-working data layout strategy in place, performance is accelerated further. If the data layout strategy is lacking or limited due to a wide variety of query-filtering patterns on the data, then the boosters make sure that performance is still improved by reducing unnecessary I/O. Let’s learn about these performance boosters.
Automatic statistics collection
The first, and arguably the most important, performance booster is automatic statistics collection (stats collection for short), which enables a process called data skipping. Stats collection is an automatic process on Delta Lake. For every data file written, the stats collection process computes the minimum and maximum values for the columns present in the file.
By default, stats collection...