Chapter 11: Deploying Machine Learning Models
In the previous chapters, we've deployed models in the simplest way possible: by configuring an estimator, calling the fit()
API to train the model, and calling the deploy()
API to create a real-time endpoint. This is undoubtedly the preferred scenario for development and testing, but it's not the only one.
Models can be imported. For example, you could take an existing model that you trained on your local machine, import it into SageMaker, and deploy it as if you had it trained on SageMaker.
In addition, models can be deployed in different configurations:
- A single model on a real-time endpoint, which is what we've done so far, as well as several model variants in the same endpoint.
- A sequence of up to five models, called an inference pipeline.
- An arbitrary number of related models that are loaded on demand on the same endpoint, known as a multi-model endpoint. We'll examine this configuration in...