Summary
Our goal in this chapter was to explain why you need to monitor an endpoint running a DL model and to introduce popular tools in this domain. The tools we introduced in this chapter are designed for monitoring a set of information from an endpoint and alerting an incident when there are sudden changes to the monitored metrics. The tools that we covered are CloudWatch, Prometheus, Grafana, Datadog, SageMaker Clarify, PagerDuty, and Dynatrace. For completeness, we looked at how CloudWatch can be integrated into SageMaker and EKS for monitoring an endpoint as well as model performance.
In the next chapter, as the last chapter of this book, we will explore the process of evaluating a completed project and discussing potential improvements.