Monitoring data quality with Databricks Lakehouse Monitoring
Use Databricks Lakehouse Monitoring to proactively detect and respond to any deviations in your data distribution. Over time, your data may undergo changes in its underlying patterns. This could be feature drift, where the distribution of feature data changes over time, or concept drift, where the relationship between inputs and outputs of your model changes. Both types of drift can cause model quality to suffer. These changes can occur slowly or rapidly in your production environment, which is why monitoring your data even before it becomes an input into your ML models and data products is essential.
Mechanics of Lakehouse Monitoring
To monitor a table in Databricks, you create a monitor attached to that table. To monitor the performance of a ML model, you attach the monitor to an inference table that holds the model’s inputs and corresponding predictions. Databricks Lakehouse Monitoring provides the following...