Summary
In this chapter, we learned how to efficiently make ML inferences in the cloud using Amazon SageMaker. We followed up with what we trained in the previous chapter—an IMDb movie review sentiment prediction—to demonstrate SageMaker's batch transform and real-time hosting. More importantly, we learned how to optimize for cost and model latency with load testing. We also learned about another great cost-saving opportunity by hosting multiple ML models in one single endpoint using SageMaker multi-model endpoints. Once you have selected the best inference option and instance types for your use case, SageMaker makes deploying your models straightforward. With these step-by-step instructions and this discussion, you will be able to translate what you've learned to your own ML use cases.
In the next chapter, we will take a different route to learn how we can use SageMaker's JumpStart and Autopilot to quick-start your ML journey. SageMaker JumpStart offers...