Autoscaling an endpoint
Autoscaling has long been the most important technique in adjusting infrastructure size to incoming traffic, and it's available for SageMaker endpoints. However, it's based on Application Autoscaling, and not on EC2 Autoscaling (https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html), although the concepts are extremely similar.
Let's set up autoscaling for the XGBoost model we trained on the Boston Housing dataset:
- We first create an Endpoint Configuration, and we use it to build the endpoint. Here, we use the
m5
instance family:t2
andt3
are not recommended for autoscaling as their burstable behavior makes it harder to measure their real load:model_name = 'sagemaker-xgboost-2020-06-09-08-33-24-782'endpoint_config_name = 'xgboost-one-model-epc'endpoint_name = 'xgboost-one-model-ep' production_variants = [{Â Â Â Â 'VariantName': &apos...