5. Monitoring
Monitoring is vital for any ML system that reaches production. Traditional software systems are rule-based and deterministic. Thus, once it is built, it will always work as defined. Unfortunately, that is not the case with ML systems. When implementing ML models, we haven’t explicitly described how they should work. We have used data to compile a probabilistic solution, which means that our ML model will constantly be exposed to a level of degradation. This happens because the data from production might differ from the data the model was trained on. Thus, it is natural that the shipped model doesn’t know how to handle these scenarios.
We shouldn’t try to avoid these situations but create a strategy to catch and fix these errors in time. Intuitively, monitoring detects the model’s performance degradation, which triggers an alarm that signals that the model should be retrained manually, automatically, or with a combination of both.
Why...