Summary
As we conclude Chapter 5, we have successfully navigated the multifaceted realm of feature engineering on Databricks. We have learned how to organize our features into feature tables, in both SQL and Python, by ensuring there is a non-nullable primary key. Unity Catalog provides lineage and discoverability, which makes features reusable. Continuing with the streaming project, we also highlighted creating a streaming feature using stateful streaming. We touched on the latest feature engineering products from Databricks, such as point-in-time lookups, on-demand feature functions, and publishing tables to the Databricks Online Store. These product features will reduce time to production and simplify production pipelines.
You are now ready to tackle feature engineering for a variety of scenarios! Next up, we take what we’ve learned to build training sets and machine learning models in Chapter 6.