Autoscaling Pods horizontally using a HorizontalPodAutoscaler
While a VPA acts like an optimizer of resource usage, the true scaling of your Deployments and StatefulSets that run multiple Pod replicas can be done using an HPA. At a high level, the goal of the HPA is to automatically scale the number of replicas in Deployment or StatefulSets depending on the current CPU utilization or other custom metrics (including multiple metrics at once). The details of the algorithm that determines the target number of replicas based on metric values can be found here: https://kubernetes.io/docs/tasks/run-application/horizontal-Pod-autoscale/#algorithm-details.
Not all applications will work equally efficiently with HPAs and VPAs. Some of them might work better using one method, but others might either not support autoscaling or even suffer from the method. Always analyze your application behavior prior to using any autoscaling approach.
A high-level diagram to demonstrate...