Monitoring and Troubleshooting an Application Running in Production
In the previous chapter, we got an overview of the three most popular ways of running containerized applications in the cloud – AWS EKS, Azure AKS, and Google GKE. We then explored each of the hosted solutions and discussed their pros and cons.
This chapter looks at different techniques used to instrument and monitor an individual service or a whole distributed application running on a Kubernetes cluster. You will be introduced to the concept of alerting based on key metrics. The chapter also shows how one can troubleshoot an application service that is running in production without altering the cluster or the cluster nodes on which the service is running.
Here is a list of topics we are going to discuss in this chapter:
- Monitoring an individual service
- Using OpenTracing for distributed tracing
- Leveraging Prometheus and Grafana to monitor a distributed application
- Defining alerts based...