Enabling data capture and simulating predictions
After an ML model has been deployed to an inference endpoint, its quality needs to be monitored and checked so that we can easily perform corrective actions whenever quality issues or deviations are detected. This is similar to web application development, where even if the quality assurance team has already spent days (or weeks) testing the final build of the application, there can still be other issues that would only be detected once the web application is running already:
Figure 8.8 – Capturing the request and response data of the ML inference endpoint
As shown in the preceding diagram, model monitoring starts by capturing the request and response data, which passes through a running ML inference endpoint. This collected data is processed and analyzed in a later step using a separate automated task or job that can generate reports and flag issues or anomalies. If we deployed our ML model in a custom...