Summary
We defined model deployment as integrating your model into a client application. We talked about the characteristics of data science teams that may commonly deploy their own models, versus those who may specialize in more general analysis. We introduced a variety of use cases where model deployment is a critical part of the entire application. While noting a variety of hybrid architectures, we focused explicitly on deployments in the cloud. We learned about some of the best ways to host your models, including options on SageMaker such as real-time endpoints, batch transform and notebook jobs, asynchronous endpoints, multi-model endpoints, serverless endpoints, and more. We learned about options for reducing the size of your model, from compilation to distillation and quantization. We covered distributed model hosting and closed out with a review of model servers and end-to-end hosting optimization tips on SageMaker.
Next up, we’ll dive into a set of techniques you...