Autoscaling Kubernetes Clusters
Kubernetes clusters are designed to run scalable applications reliably. In other words, if the Kubernetes cluster runs 10 instances of your application today, it should also support running 100 instances in the future. There are two mainstream methods to reach this level of flexibility: redundancy and autoscaling. Let's assume that the 10 instances of your application are running on 3 servers in your cluster. With the redundancy, you need at least 27 extra idle servers to be capable of running 100 instances in the future. It also means paying for the empty servers as well as operational and maintenance costs. With autoscaling, you need automated procedures to create or remove servers. Autoscaling ensures that there are no excessive idle servers and minimizes the costs while meeting the scalability requirements.
GKE Cluster Autoscaler is the out-of-box solution for handling autoscaling in Kubernetes clusters. When it is enabled, it automatically...