Summary
In this chapter, we discussed and focused on several deployment options and solutions using SageMaker. We deployed a pre-trained model into three different types of inference endpoints – (1) a real-time inference endpoint, (2) a serverless inference endpoint, and (3) an asynchronous inference endpoint. We also discussed the differences of each approach, along with when each option is best used when deploying ML models. Toward the end of this chapter, we talked about some of the deployment strategies, along with the best practices when using SageMaker for model deployments.
In the next chapter, we will dive deeper into SageMaker Model Registry and SageMaker Model Monitor, which are capabilities of SageMaker that can help us manage and monitor our models in production.