Summary
In this chapter, we discussed various Google Cloud services for hosting ML models, such as Vertex AI, Cloud Functions, GKE, and Cloud Run. We differentiated between online and offline model serving, whereby online serving is used for real-time predictions, and offline serving is used for batch predictions. Then, we explored common challenges in deploying ML models, such as data/model drift, scaling, monitoring, performance, and keeping our models up to date. We also introduced specific components of Vertex AI that make it easier for us to deploy and manage models, such as the Vertex AI Model Registry, the Vertex AI prediction service, and Vertex AI Model Monitoring.
Specifically, we dived quite deep into monitoring models in production, focusing on data drift and model drift. We discussed mechanisms to combat these drifts, such as automated continuous training.
Next, we explained A/B testing for comparing two versions of a model, and we discussed optimizing ML models...