Summary
In this final chapter, you learned different techniques that help to reduce prediction costs with SageMaker. First, you saw how to use autoscaling to scale prediction infrastructure according to incoming traffic. Then, you learned how to deploy an arbitrary number of models on the same endpoint, thanks to multi-model endpoints.
We also worked with Amazon Elastic Inference, which allows you to add fractional GPU acceleration to a CPU-based instance, and to find the right cost-performance ratio for your application. We then moved on to Amazon SageMaker Neo, an innovative capability that compiles models for a given hardware architecture, both for EC2 instances and embedded devices. Finally, we built a cost optimization checklist that will come in handy in your upcoming SageMaker projects.
You've made it to the end. Congratulations! You now know a lot about SageMaker. Now, go grab a dataset, build cool stuff, and let me know about it!