In the previous section, we looked at live cluster upgrades. We explored various techniques and how Kubernetes supports them. We also discussed difficult problems, such as breaking changes, data contract changes, data migration, and API deprecation. In this section, we will consider the various options and configurations of large clusters with different reliability and high-availability properties. When you design your cluster, you need to understand your options and choose wisely based on the needs of your organization.
In this section, we will cover various availability requirements, from best effort all the way to the Holy Grail of zero downtime, and for each category of availability, we will consider what it means from the perspectives of performance and cost.