Delta Lake as an offline feature store
In Chapter 3, Data Cleansing and Integration, we established data lakes as the scalable and relatively inexpensive choice for the long-term storage of historical data. Some challenges with reliability and cloud-based data lakes were presented, and you learned how Delta Lake has been designed to overcome these challenges. The benefits of Delta Lake as an abstraction layer on top of cloud-based data lakes extend beyond just data engineering workloads to data science workloads as well, and we will explore those benefits in this section.
Delta Lake makes for an ideal candidate for an offline feature store on cloud-based data lakes because of the data reliability features and the novel time travel features that Delta Lake has to offer. We will discuss these in the following sections.
Structure and metadata with Delta tables
Delta Lake supports structured data with well-defined data types for columns. This makes Delta tables strongly typed...