Optimizing your model deployment
Optimizing model deployment is a critical topic for businesses. No one wants to be spending a dime more than they need to. Because deployed endpoints are being used continuously, and incurring charges continuously, making sure that the deployment is optimized in terms of cost and runtime performance can save you a lot of money. SageMaker has several options to help you reduce costs while optimizing the runtime performance. In this section, we will be discussing multi-model endpoint deployment and how to choose the instance type and autoscaling policy for your use case.
Hosting multi-model endpoints to save costs
A multi-model endpoint is a type of real-time endpoint in SageMaker that allows multiple models to be deployed behind the same endpoint. There are many use cases in which you would build models for each customer or for each geographic area, and depending on the characteristics of the incoming data point, you would apply the corresponding...