Demystifying Delta
The Delta protocol is based on Parquet and has several components. Let's look at its composition. The transaction log is the secret sauce that supports key features such as ACID compliance, schema evolution, and time travel and unlocks the power of Delta. It is an ordered record of every change that's been made to the table by users and can be regarded as the single source of truth. The following diagram shows the sub-components that are broadly regarded as part of a Delta table:
The main point to highlight is that metadata lies alongside data in the transaction logs. Before this, all the metadata was in the metastore. However, when the data is changing frequently, it would be too much information to store in a metastore, and storing just the last state means lineage and history will be lost. In the context of big data, the transaction history and metadata changes are also big data by...