Taking care of Scalability Configurations
To kickstart auto scaling for your model, you can take advantage of the SageMaker console, AWS Command Line Interface (AWS CLI), or an AWS SDK through the Application Auto Scaling API. For those inclined towards the CLI or API, the process involves registering the model as a scalable target, defining the scaling policy, and then applying it. If you opt for the SageMaker console, simply navigate to Endpoints under Inference in the navigation pane, locate your model’s endpoint name, and choose it along with the variant name to activate auto scaling.
Let’s now dive into the intricacies of scaling policies.
Scaling Policy Overview
Auto scaling is driven by scaling policies, which determine how instances are added or removed in response to varying workloads. Two options are at your disposal: target tracking and step scaling policies.
Target Tracking Scaling Policies: Our recommendation is to leverage target tracking scaling...