Monitoring data and performance drift in SageMaker Studio
In this chapter, let's consider an ML scenario: we train an ML model and host it in an endpoint. We also create artificial inference traffic to the endpoint, with random perturbation injected into each data point. This is to introduce noise, missingness, and drift to the data. We then proceed to create a data quality monitor and a model quality monitor using SageMaker Model Monitor. We use a simple ML dataset, the abalone dataset from UCI (https://archive.ics.uci.edu/ml/datasets/abalone), for this demonstration. Using this dataset, we train a regression model to predict the number of rings, which is proportionate to the age of abalone.
Training and hosting a model
We will follow the next steps to set up what we need prior to the model monitoring—getting data, training a model, hosting it, and creating traffic:
- Open the notebook in
Getting-Started-with-Amazon-SageMaker-Studio/chapter10/01-train_host_predict...