Delta Lake is an open source project released by Databricks in 2019. It stores files in Parquet. In addition, it is able to keep track of data check-ins, enabling the data scientist to look at data as it existed at a given time. This can be useful when trying to determine why accuracy in a particular ML model drifted. It also keeps metadata about the data, giving it a 10-times performance increase over standard Parquet for analytics workloads.
While considerations are given to both choosing a device and setting up Databricks, the rest of this chapter will follow a modular, recipe-based format.