Autoscaling an endpoint
Autoscaling has long been the most important technique in adjusting infrastructure size for incoming traffic, and it's available for SageMaker endpoints. However, it's based on Application Auto Scaling and not on EC2 Auto Scaling (https://docs.aws.amazon.com/autoscaling/application/userguide/what-is-application-auto-scaling.html), although the concepts are extremely similar.
Let's set up autoscaling for the XGBoost model we trained on the Boston Housing dataset:
- We first create an endpoint configuration, and we use it to build the endpoint. Here, we use the m5 instance family; t2 and t3 are not recommended for autoscaling as their burstable behavior makes it harder to measure their real load:
model_name = 'sagemaker-xgboost-2020-06-09-08-33-24-782' endpoint_config_name = 'xgboost-one-model-epc' endpoint_name = 'xgboost-one-model-ep' production_variants = [{ 'VariantName&apos...