In this chapter, we will focus on the operational side of running a large-scale distributed system on Kubernetes, as well as on how to design the system and what to take into account to ensure top-notch operational posture. That being said, things will always go south and you must be ready to detect, troubleshoot, and respond as soon as possible. The operational best practices that Kubernetes provides out of the box include the following:
- Self-healing
- Auto scaling
- Resource management
However, the cluster administrator and the developers must understand how these capabilities work, configure, and interact in order to understand them properly. There is always a balancing act between high availability, robustness, performance, security, and cost. It's also important to realize that all of these factors and the relationships between them change...