Autoscaling Pods horizontally using a Horizontal Pod Autoscaler
While a VPA acts like an optimizer of resource usage, the true scaling of your Deployments and StatefulSets that run multiple Pod replicas can be done using a Horizontal Pod Autoscaler (HPA). At a high level, the goal of the HPA is to automatically scale the number of replicas in Deployment or StatefulSets depending on the current CPU utilization or other custom metrics (including multiple metrics at once). The details of the algorithm that determines the target number of replicas based on metric values can be found here: https://kubernetes.io/docs/tasks/run-application/horizontal-Pod-autoscale/#algorithm-details. HPAs are highly configurable and, in this chapter, we will cover a standard scenario in which we would like to autoscale based on target CPU usage.
Important note
An HPA is represented by a built-in HorizontalPodAutoscaler
API resource in Kubernetes in the autoscaling
API group. The current stable version...