Deploying multiple models behind a single inference endpoint
A SageMaker inference endpoint is a logical entity that actually holds a load balancer and one or more instances of your inference container. You can deploy either multiple versions of the same model or entirely different models behind a single endpoint. In this section, we'll look at these two use cases.
Multiple versions of the same model
A SageMaker endpoint lets you host multiple models that serve different percentages of traffic for incoming requests. That capability supports common continuous integration (CI)/continuous delivery (CD) practices such as canary and blue/green deployments. While these practices are similar, they have slightly different purposes, as explained here:
- A canary deployment means that you let the new version of a model host a small percentage of traffic that lets you test a new version of the model on a subset of traffic until you are satisfied that it is working well.
- A...