Monitoring data drift
The famous Greek philosopher, Heraclitus, said "Change is the only constant in life."Â
Drift refers to the process of moving away from the expected norm. In the world of data, drift is applicable in different contexts. This includes drift in data, in the model, in performance, and in business metrics. Most of the model drift is on account of drift in data. We detect drift in a model by monitoring its accuracy using the F1 score, precision, recall, and other metrics. If the values fall below a certain threshold, then this signals that the business logic needs to be re-evaluated. Drift is usually detected in the context of model drift but that is too late in the pipeline. Profiling the data continuously helps detect drift sooner.
Drift can be classified into two categories, as follows:
- Data drift:
- New fields get added, older fields get dropped or changed, or the statistical quality of the data changes because the product was introduced...