Common K8s tools/techniques for troubleshooting EKS
Any troubleshooting process begins with trying to understand the problem and differentiate between symptoms and the root cause of the problem. Symptoms can often be mistaken for the root cause, so the troubleshooting process tends to be iterative, with constant testing and observation as you focus on what the actual problem is and disregard the symptoms and false positives.
Once you understand the problem, it’s now a case of understanding how to mitigate, solve, or ignore it. Not all problems can be solved then and there, so you may need a strategy to work around them for the time being.
The final stage will be the resolution/fixing of the problem. This might require an update to your cluster, application code, or both, and depending on the nature of the problem, the number of clusters you manage and the impact on the users/customers may be quite an involved process.
Generally, I use the following checklist when trying...