Computing on-demand features
Calculating the number of transactions per customer in a brief time window works in a streaming fashion because we only need to use historical data. When we want to use a feature that requires data available only at inference time, we use on-demand features, with unknown values until inference time. In Databricks, you can create on-demand features with Python user-defined functions (UDFs). These Python UDFs can then be invoked via training_set
configurations to create training datasets, as you will see in Chapter 6.
Let’s consider the Streaming Transactions project again. We want to add a feature for the amount a product sold at, compared to its historical maximum price, and use this as part of the training data to predict the generated classification label. In this scenario, we don’t know the purchase price until the transaction has been received. We’ll cover how to build a Python UDF for calculating an on-demand feature for the...