Adding speed with Z-ordering
Z-ordering is the process of collocating data related to common files. This can allow for data skipping and a significant reduction in processing time. Z-order is applied per column and should be used like partitions on columns when you’re filtering your table.
Here, we are applying Z-order to the whole table for a given column:
deltaTable.optimize().executeZOrderBy(COLUMN NAME)
We can also use the where
method to apply Z-ordering to a table slice:
deltaTable.optimize().where("date=' YYYY-MM-DD'").executeZOrderBy(COLUMUN NAME)
With that, we have looked at one type of performance enhancement with Delta tables: Z-ordering. Next, we will look at another critical performance enhancement, known as bloom filtering. What makes bloom filtering is that it’s a data structure that saves space and allows for data skipping.
Bloom filters
One way to increase read speed is to use bloom filters. A bloom filter is an...