Databricks Feature Engineering in Unity Catalog
In this chapter, we will focus on several types of features. Feature types can be roughly grouped based on when they are calculated relative to the time of model prediction. The three types we cover in this chapter are batch, streaming, and on-demand:
- Batch features: These are features that are generated hours, days, or even further ahead of when the feature will be used as a model input. There are several reasons for this; the model may be offline and run in a batch prediction fashion, or perhaps the feature values change slowly relative to the model inference timeline. In these cases, it is not necessary to recompute the values more frequently. An example of a batch feature is the
holidays
field in our Favorita Sales Forecasting project, as we can compute it before we need it and don’t expect the values to change frequently. - Streaming features: These are features processed in real or near-real time as the source data...