Chapter 13: Optimizing Prediction Cost and Performance
In the previous chapter, you learned how to automate training and deployment workflows.
In this final chapter, we'll focus on optimizing cost and performance for prediction infrastructure, which typically accounts for 90% of the machine learning spend by AWS customers. This number may come as a surprise, until we realize that a model built by a single training job may end on multiple endpoints running 24/7 on a large scale.
Hence, great care must be taken to optimize your prediction infrastructure to ensure that you get the most bang for your buck!
This chapter features the following topics:
- Autoscaling an endpoint
- Deploying a multi-model endpoint
- Deploying a model with Amazon Elastic Inference
- Compiling models with Amazon SageMaker Neo