Our goal is to deploy an application that will be automatically scaled (or de-scaled) depending on its use of resources. We'll start by deploying an app first, and discuss how to accomplish auto-scaling later.
Let's take a look at a definition of the application we'll use in our examples.
1 cat scaling/go-demo-5-no-sidecar-mem.yml
If you are familiar with Kubernetes, the YAML definition should be self-explanatory. We'll comment only the parts that are relevant for auto-scaling.
The output, limited to the relevant parts, is as follows.
... apiVersion: apps/v1 kind: StatefulSet metadata: name: db namespace: go-demo-5 spec: ... template: ... spec: ... containers: - name: db ... resources: limits: memory: "150Mi" cpu: 0.2 requests: memory: "100Mi" cpu: 0.1 ... - name: db-sidecar ...
apiVersion: apps/v1 kind: Deployment metadata: name: api namespace: go-demo-5 spec: ...
template: ... spec: containers: - name: api ... resources: limits: memory: 15Mi cpu: 0.1 requests: memory: 10Mi cpu: 0.01 ...
We have two Pods that form an application. The api Deployment is a backend API that uses db StatefulSet for its state.
The essential parts of the definition are resources. Both the api and the db have requests and limits defined for memory and CPU. The database uses a sidecar container that will join MongoDB replicas into a replica set. Please note that, unlike other containers, the sidecar does not have resources. The importance behind that will be revealed later. For now, just remember that two containers have the requests and the limits defined, and that one doesn't.
Now, let's create those resources.
1 kubectl apply \ 2 -f scaling/go-demo-5-no-sidecar-mem.yml \ 3 --record
The output should show that quite a few resources were created and our next action is to wait until the api Deployment is rolled out thus confirming that the application is up-and-running.
1 kubectl -n go-demo-5 \ 2 rollout status \ 3 deployment api
After a few moments, you should see the message stating that deployment "api" was successfully rolled out.
To be on the safe side, we'll list the Pods in the go-demo-5 Namespace and confirm that one replica of each is running.
1 kubectl -n go-demo-5 get pods
The output is as follows.
NAME READY STATUS RESTARTS AGE api-... 1/1 Running 0 1m db-0 2/2 Running 0 1m
So far, we did not yet do anything beyond the ordinary creation of the StatefulSet and the Deployment.
They, in turn, created ReplicaSets, which resulted in the creation of the Pods.
As you hopefully know, we should aim at having at least two replicas of each Pod, as long as they are scalable. Still, neither of the two had replicas defined. That is intentional. The fact that we can specify the number of replicas of a Deployment or a StatefulSet does not mean that we should. At least, not always.
Let's take a look at a simple example of a HorizontalPodAutoscaler.
1 cat scaling/go-demo-5-api-hpa.yml
The output is as follows.
apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: api namespace: go-demo-5 spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api minReplicas: 2 maxReplicas: 5 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 - type: Resource resource: name: memory targetAverageUtilization: 80
The definition uses HorizontalPodAutoscaler targeting the api Deployment. Its boundaries are the minimum of two and the maximum of five replicas. Those limits are fundamental. Without them, we'd run a risk of scaling up into infinity or scaling down to zero replicas. The minReplicas and maxReplicas fields are a safety net.
The key section of the definition is metrics. It provides formulas Kubernetes should use to decide whether it should scale (or de-scale) a resource. In our case, we're using the Resource type entries. They are targeting average utilization of eighty percent for memory and CPU. If the actual usage of the either of the two deviates, Kubernetes will scale (or de-scale) the resource.
Please note that we used v2beta1 version of the API and you might be wondering why we chose that one instead of the stable and production ready v1. After all, beta1 releases are still far from being polished enough for general usage. The reason is simple. HorizontalPodAutoscaler v1 is too basic. It only allows scaling based on CPU. Even our simple example goes beyond that by adding memory to the mix. Later on, we'll extend it even more. So, while v1 is considered stable, it does not provide much value, and we can either wait until v2 is released or start experimenting with v2beta releases right away. We're opting for the latter option. By the time you read this, more stable releases are likely to exist and to be supported in your Kubernetes cluster. If that's the case, feel free to change apiVersion before applying the definition.
Now let's apply it.
1 kubectl apply \ 2 -f scaling/go-demo-5-api-hpa.yml \ 3 --record
We applied the definition that created the HorizontalPodAutoscaler (HPA). Next, we'll take a look at the information we'll get by retrieving the HPA resources.
1 kubectl -n go-demo-5 get hpa
If you were quick, the output should be similar to the one that follows.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api <unknown>/80%, <unknown>/80% 2 5 0 20s
We can see that Kubernetes does not yet have the actual CPU and memory utilization and that it output <unknown> instead. We need to give it a bit more time until the next iteration of data gathering from the Metrics Server. Get yourself some coffee before we repeat the same query.
1 kubectl -n go-demo-5 get hpa
This time, the output is without unknowns.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api 38%/80%, 10%/80% 2 5 2 1m
We can see that both CPU and memory utilization are way below the expected utilization of 80%. Still, Kubernetes increased the number of replicas from one to two because that's the minimum we defined. We made the contract stating that the api Deployment should never have less than two replicas, and Kubernetes complied with that by scaling up even if the resource utilization is way below the expected average utilization. We can confirm that behavior through the events of the HorizontalPodAutoscaler.
1 kubectl -n go-demo-5 describe hpa api
The output, limited to the event messages, is as follows.
... Events: ... Message ... ------- ... New size: 2; reason: Current number of replicas below Spec.MinReplicas
The message of the event should be self-explanatory. The HorizontalPodAutoscaler changed the number of replicas to 2 because the current number (1) was below the MinReplicas value.
Finally, we'll list the Pods to confirm that the desired number of replicas is indeed running.
1 kubectl -n go-demo-5 get pods
The output is as follows.
NAME READY STATUS RESTARTS AGE api-... 1/1 Running 0 2m api-... 1/1 Running 0 6m db-0 2/2 Running 0 6m
So far, the HPA did not yet perform auto-scaling based on resource usage. Instead, it only increased the number of Pod to meet the specified minimum. It did that by manipulating the Deployment.
Next, we'll try to create another HorizontalPodAutoscaler but, this time, we'll target the StatefulSet that runs our MongoDB. So, let's take a look at yet another YAML definition.
1 cat scaling/go-demo-5-db-hpa.yml
The output is as follows.
apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: db namespace: go-demo-5 spec: scaleTargetRef: apiVersion: apps/v1 kind: StatefulSet name: db minReplicas: 3 maxReplicas: 5 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 - type: Resource resource: name: memory targetAverageUtilization: 80
That definition is almost the same as the one we used before. The only difference is that this time we're targeting StatefulSet called db and that the minimum number of replicas should be 3.
Let's apply it.
1 kubectl apply \ 2 -f scaling/go-demo-5-db-hpa.yml \ 3 --record
Let's take another look at the HorizontalPodAutoscaler resources.
1 kubectl -n go-demo-5 get hpa
The output is as follows.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api 41%/80%, 0%/80% 2 5 2 5m db StatefulSet/db <unknown>/80%, <unknown>/80% 3 5 0 20s
We can see that the second HPA was created and that the current utilization is unknown. That must be a similar situation as before. Should we give it some time for data to start flowing in? Wait for a few moments and retrieve HPAs again. Are the targets still unknown?
There might be something wrong since the resource utilization continued being unknown. Let's describe the newly created HPA and see whether we'll be able to find the cause behind the issue.
1 kubectl -n go-demo-5 describe hpa db
The output, limited to the event messages, is as follows.
... Events: ... Message ... ------- ... New size: 3; reason: Current number of replicas below Spec.MinReplicas ... missing request for memory on container db-sidecar in pod go-demo-5/db-0 ... failed to get memory utilization: missing request for memory on container db-sidecar in pod go-demo-5/db-0
If we focus on the first message, we can see that it started well. HPA detected that the current number of replicas is below the limit and increased them to three. That is the expected behavior, so let's move to the other two messages.
HPA could not calculate the percentage because we did not specify how much memory we are requesting for the db-sidecar container. Without requests, HPA cannot calculate the percentage of the actual memory usage. In other words, we missed specifying resources for the db-sidecar container and HPA could not do its work. We'll fix that by applying go-demo-5-no-hpa.yml.
Let's take a quick look at the new definition.
1 cat scaling/go-demo-5-no-hpa.yml
The output, limited to the relevant parts, is as follows.
... apiVersion: apps/v1 kind: StatefulSet metadata: name: db namespace: go-demo-5 spec: ... template: ... spec: ... - name: db-sidecar ... resources: limits: memory: "100Mi" cpu: 0.2 requests: memory: "50Mi" cpu: 0.1 ...
The only noticeable difference, when compared with the initial definition, is that this time we defined the resources for the db-sidecar container. Let's apply it.
1 kubectl apply \ 2 -f scaling/go-demo-5-no-hpa.yml \ 3 --record
Next, we'll wait for a few moments for the changes to take effect, before we retrieve the HPAs again.
1 kubectl -n go-demo-5 get hpa
This time, the output is more promising.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api 66%/80%, 10%/80% 2 5 2 16m db StatefulSet/db 60%/80%, 4%/80% 3 5 3 10m
Both HPAs are showing the current and the target resource usage. Neither reached the target values, so HPA is maintaining the minimum number of replicas. We can confirm that by listing all the Pods in the go-demo-5 Namespace.
1 kubectl -n go-demo-5 get pods
The output is as follows.
NAME READY STATUS RESTARTS AGE api-... 1/1 Running 0 42m api-... 1/1 Running 0 46m db-0 2/2 Running 0 33m db-1 2/2 Running 0 33m db-2 2/2 Running 0 33m
We can see that there are two Pods for the api Deployment and three replicas of the db StatefulSet. Those numbers are equivalent to the spec.minReplicas entries in the HPA definitions.
Let's see what happens when the actual memory usage is above the target value.
We'll modify the definition of one of the HPAs by lowering one of the targets as a way to reproduce the situation in which our Pods are consuming more resources than desired.
Let's take a look at a modified HPA definition.
1 cat scaling/go-demo-5-api-hpa-low-mem.yml
The output, limited to the relevant parts, is as follows.
apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: api namespace: go-demo-5 spec: ... metrics: ... - type: Resource resource: name: memory targetAverageUtilization: 10
We decreased targetAverageUtilization to 10. That will surely be below the current memory utilization, and we'll be able to witness HPA in action. Let's apply the new definition.
1 kubectl apply \ 2 -f scaling/go-demo-5-api-hpa-low-mem.yml \ 3 --record
Please wait a few moments for the next iteration of data gathering to occur, and retrieve the HPAs.
1 kubectl -n go-demo-5 get hpa
The output is as follows.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api 49%/10%, 10%/80% 2 5 2 44m db StatefulSet/db 64%/80%, 5%/80% 3 5 3 39m
We can see that the actual memory of the api HPA (49%) is way above the threshold (10%). However, the number of replicas is still the same (2). We'll have to wait for a few more minutes before we retrieve HPAs again.
1 kubectl -n go-demo-5 get hpa
This time, the output is slightly different.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE api Deployment/api 49%/10%, 10%/80% 2 5 4 44m db StatefulSet/db 64%/80%, 5%/80% 3 5 3 39m
We can see that the number of replicas increased to 4. HPA changed the Deployment, and that produced the cascading effect that resulted in the increased number of Pods.
Let's describe the api HPA.
1 kubectl -n go-demo-5 describe hpa api
The output, limited to the messages of the events, is as follows.
... Events: ... Message ... ------- ... New size: 2; reason: Current number of replicas below Spec.MinReplicas ... New size: 4; reason: memory resource utilization (percentage of request) above target
We can see that the HPA changed the size to 4 because memory resource utilization (percentage of request) was above target.
Since, in this case, increasing the number of replicas did not reduce memory consumption below the HPA target, we should expect that the HPA will continue scaling up the Deployment until it reaches the limit of 5. We'll confirm that assumption by waiting for a few minutes and describing the HPA one more time.
1 kubectl -n go-demo-5 describe hpa api
The output, limited to the messages of the events, is as follows.
... Events: ... Message ... ------- ... New size: 2; reason: Current number of replicas below Spec.MinReplicas ... New size: 4; reason: memory resource utilization (percentage of request) above target ... New size: 5; reason: memory resource utilization (percentage of request) above target
We got the message stating that the new size is now 5, thus proving that the HPA will continue scaling up until the resources are below the target or, as in our case, it reaches the maximum number of replicas.
We can confirm that scaling indeed worked by listing all the Pods in the go-demo-5 Namespace.
1 kubectl -n go-demo-5 get pods
The output is as follows.
NAME READY STATUS RESTARTS AGE api-... 1/1 Running 0 47m api-... 1/1 Running 0 51m api-... 1/1 Running 0 4m api-... 1/1 Running 0 4m api-... 1/1 Running 0 24s db-0 2/2 Running 0 38m db-1 2/2 Running 0 38m db-2 2/2 Running 0 38m
As we can see, there are indeed five replicas of the api Deployment.
HPA retrieved data from the Metrics Server, concluded that the actual resource usage is higher than the threshold, and manipulated the Deployment with the new number of replicas.
Next, we'll validate that de-scaling works as well. We'll do that by re-applying the initial definition that has both the memory and the CPU set to eighty percent. Since the actual memory usage is below that, the HPA should start scaling down until it reaches the minimum number of replicas.
1 kubectl apply \ 2 -f scaling/go-demo-5-api-hpa.yml \ 3 --record
Just as before, we'll wait for a few minutes before we describe the HPA.
1 kubectl -n go-demo-5 describe hpa api
The output, limited to the events messages, is as follows.
... Events: ... Message ... ------- ... New size: 2; reason: Current number of replicas below Spec.MinReplicas ... New size: 4; reason: memory resource utilization (percentage of request) above target ... New size: 5; reason: memory resource utilization (percentage of request) above target ... New size: 3; reason: All metrics below target
As we can see, it changed the size to 3 since all the metrics are below target.
A while later, it will de-scale again to two replicas and stop since that's the limit we set in the HPA definition.