Troubleshooting scenarios
There are so many things that can go wrong in a large Kubernetes cluster, and they will, this is expected. You can employ best practices and minimize some of them (mostly human errors) using stricter processes. However, some issues such as hardware failures and networking issues can't be totally avoided. Even human errors should not always be minimized if it means slower development time. In this section, we'll discuss various categories of failures, how to detect them, how to evaluate their impact, and consider the proper response.