The role of Delta in an ML pipeline
Delta's capabilities around ACID transaction support, schema evolution, and time travel come in handy in the context of designing ML pipelines. Let us examine details of each of the four co-operating pipelines involved in creating and managing an ML asset.
Delta-backed feature store
Feature engineering is time-consuming and involves resource-intensive computation, domain knowledge. Poor feature engineering can have an adverse impact on the quality of ML models, so a lot of attention and care should be given to its computation.
Features are the inputs to ML models and they have to be computed based on raw data. Feature augmentation and pre-computed features require a feature store that precomputes those features and makes them available both at training and serving.
Features can be of several types, such as transformative which requires category encoding, context...