Monitoring an EKS endpoint using CloudWatch
Along with SageMaker, we have described EKS-based endpoints in Chapter 9, Scaling a Deep Learning Pipeline. In this section, we describe CloudWatch-based monitoring available for EKS. First, we will learn how EKS metrics from the container can be logged for monitoring. Next, we will explain how to log model-related metrics from an EKS inference endpoint.
Let’s first look at how to set up CloudWatch for monitoring an EKS cluster. The simplest approach is to install a CloudWatch agent in the container. Additionally, you can install Fluent Bit, an open source tool that further enhances the logging process (www.fluentbit.io). For a complete explanation of CloudWatch agents and Fluent Bit, please read https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html.
Another option is to persist the default metrics sent by the EKS control plane. This can be easily enabled from...