Understanding Delta Lake
As mentioned before, modern data lakes lack some critical features such as ACID transactions, indexing, and versioning, which can negatively affect the reliability, quality, and performance of data.
Important Note
In data warehouse terms, ACID is the short form for atomicity, consistency, isolation, and durability of data. These properties are intended to make database transactions accurate, reliable, and permanent.
The following diagram depicts the properties:
Delta Lake functions as a layer on top of the distributed computing framework Apache Spark. By design, Apache Spark lacks the following key principles of transaction management:
- It does not lock previous data during edit transactions, which means data may become unavailable during overwrites for a very brief period.
- During data overwrites, there is a chance where the old data gets deleted yet the new data...