Basic concepts of Amazon SageMaker Endpoint Production Variants
In this section, you will review the basics of deploying and updating ML models using SageMaker Endpoint Production Variants. There are two ways you can deploy a machine learning model using SageMaker: by using a real-time endpoint for low latency live predictions or a batch transform for making asynchronous predictions on large numbers of inference requests. Production Variants can be applied to real-time endpoints.
Deploying a real-time endpoint involves two steps:
- Creating an Endpoint Configuration
An endpoint configuration identifies one or more Production Variants. Each production variant indicates a model and infrastructure to deploy the model on.
- Creating an Endpoint Pointing to the Endpoint Configuration
Endpoint creation results in an HTTPS endpoint that the model consumers can use to invoke the model.
The following diagram shows two different endpoint configurations with Production Variants...