Kubernetes and GitOps
It is hard not to hear about Kubernetes these days—it is probably one of the most well-known open source projects at the moment. It originated somewhere around 2014 when a group of engineers from Google started building a container orchestrator based on the experience they accumulated working with Google’s own internal orchestrator named Borg. The project was open sourced in 2014 and reached its 1.0.0 version in 2015, a milestone that encouraged many companies to take a closer look at it.
Another reason that led to its fast and enthusiastic adoption by the community is the governance of CNCF (https://www.cncf.io). After making the project open source, Google started discussing with the Linux Foundation (https://www.linuxfoundation.org) creating a new nonprofit organization that would lead the adoption of open source cloud-native technologies. That’s how CNCF came to be created while Kubernetes became its seed project and KubeCon its major developer conference. When I said CNCF governance, I am referring mostly to the fact that every project or organization inside CNCF has a well-established structure of maintainers and details how they are nominated, how decisions are taken in these groups, and that no company can have a simple majority. This ensures that no decision will be taken without community involvement and that the overall community has an important role to play in a project life cycle.
Architecture
Kubernetes has become so big and extensible that it is really hard to define it without using abstractions such as a platform for building platforms. This is because it is just a starting point—you get many pieces, but you have to put them together in a way that works for you (and GitOps is one of those pieces). If we say that it is a container orchestration platform, this is not entirely true because you can also run virtual machines (VMs) with it, not just containers (for more details, please check https://ubuntu.com/blog/what-is-kata-containers); still, the orchestration part remains true.
Its components are split into two main parts—first is the control plane, which is made of a REpresentational State Transfer (REST) API server with a database for storage (usually etcd
), a controller manager used to run multiple control loops, a scheduler that has the job of assigning a node for our Pods (a Pod is a logical grouping of containers that helps to run them on the same node—find out more at https://kubernetes.io/docs/concepts/workloads/pods/), and a cloud controller manager to handle any cloud-specific work. The second piece is the data plane, and while the control plane is about managing the cluster, this one is about what happens on the nodes running the user workloads. A node that is part of a Kubernetes cluster will have a container runtime (which can be Docker, CRI-O, or containerd
, and there are a few others), kubelet
, which takes care of the connection between the REST API server and the container runtime of the node, and kube-proxy
, responsible for abstracting the network at the node level. See the next diagram for details of how all the components work together and the central role played by the API server.
We are not going to enter into the details of all these components; instead, for us, the REST API server that makes the declarative part possible and the controller manager that makes the system converge to the desired state are important, so we want to dissect them a little bit.
The following diagram shows an overview of a typical Kubernetes architecture:
Figure 1.1 – Kubernetes architecture
Note
When looking at an architecture diagram, you need to know that it is only able to catch a part of the whole picture. For example, here, it seems that the cloud provider with its API is an external system, but actually, all the nodes and the control plane are created in that cloud provider.
HTTP REST API server
Viewing Kubernetes from the perspective of the HyperText Transfer Protocol (HTTP) REST API server makes it like any classic application with REST endpoints and a database for storing state—in our case, usually etcd
—and with multiple replicas of the web server for high availability (HA). What is important to emphasize is that anything we want to do with Kubernetes we need to do via the API; we can’t connect directly to any other component, and this is true also for the internal ones: they can’t talk directly between them—they need to go through the API.
From our client machines, we don’t query the API directly (such as by using curl
), but instead, we use this kubectl
client application that hides some of the complexity, such as authentication headers, preparing request content, parsing the response body, and so on.
Whenever we do a command such as kubectl get pods
, there is an HTTP Secure (HTTPS) call to the API server. Then, the server goes to the database to fetch details about the Pods, and a response is created and pushed back to the client. The kubectl
client application receives it, parses it, and is able to display a nice output suited to a human reader. In order to see what exactly happens, we can use the verbose global flag of kubectl
(--v
), for which the higher value we set, the more details we get.
For an exercise, do try kubectl get pods --v=6
, when it just shows that a GET
request is performed, and keep increasing --v
to 7
, 8
, 9
, and more so that you will see the HTTP request headers, the response headers, part or all of the JavaScript Object Notation (JSON) response, and many other details.
The API server itself is not responsible for actually changing the state of the cluster—it updates the database with the new values and, based on such updates, other things are happening. The actual state changes are done by controllers and components such as scheduler
or kubelet
. We are going to drill down into controllers as they are important for our GitOps understanding.
Controller manager
When reading about Kubernetes (or maybe listening to a podcast), you will hear the word controller quite often. The idea behind it comes from industrial automation or robots, and it is about the converging control loop.
Let’s say we have a robotic arm and we give it a simple command to move at a 90-degree position. The first thing that it will do is to analyze its current state; maybe it is already at 90 degrees and there is nothing to do. If it isn’t in the right position, the next thing is to calculate the actions to take in order to get to that position, and then, it will try to apply those actions to reach its relative place.
We start with the observe phase, where we compare the desired state with the current state, then we have the diff phase, where we calculate the actions to apply, and in the action phase, we perform those actions. And again, after we perform the actions, it starts the observe phase to see if it is in the right position; if not (maybe something blocked it from getting there), actions are calculated, and we get into applying the actions, and so on until it reaches the position or maybe runs out of battery or something. This control loop continues on and on until in the observe phase, the current state matches the desired state, so there will be no actions to calculate and apply. You can see a representation of the process in the following diagram:
Figure 1.2 – Control loop
In Kubernetes, there are many controllers. We have the following:
- ReplicaSet: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
- HorizontalPodAutoscaler (HPA): https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
- And a few others can be found here, but this isn’t a complete list: https://kubernetes.io/docs/concepts/workloads/controllers/
The ReplicaSet controller is responsible for running a fixed number of Pods. You create it via kubectl
and ask to run three instances, which is the desired state. So, it starts by checking the current state: how many Pods we have running right now; it calculates the actions to take: how many more Pods to start or terminate in order to have three instances; it then performs those actions. There is also the HPA controller, which, based on some metrics, is able to increase or decrease the number of Pods for a Deployment (a Deployment is a construct built on top of Pods and ReplicaSets that allows us to define ways to update Pods (https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)), and a Deployment relies on a ReplicaSet controller it builds internally in order to update the number of Pods. After the number is modified, it is still the ReplicaSet controller that runs the control loop to reach the number of desired Pods.
The controller’s job is to make sure that the actual state matches the desired state, and they never stop trying to reach that final state. And, more than that, they are specialized in types of resources—each takes care of a small piece of the cluster.
In the preceding examples, we talked about internal Kubernetes controllers, but we can also write our own, and that’s what Argo CD really is—a controller, its control loop taking care that the state declared in a Git repository matches the state from the cluster. Well, actually, to be correct, it is not a controller but an operator, the difference being that controllers work with internal Kubernetes objects while operators deal with two domains: Kubernetes and something else. In our case, the Git repository is the outside part handled by the operator, and it does that using something called custom resources, a way to extend Kubernetes functionality (https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
So far, we have looked at the Kubernetes architecture with the API server connecting all the components and how the controllers are always working within control loops to get the cluster to the desired state. Next, we will get into details on how we can define the desired state: we will start with the imperative way, continue with the more important declarative way, and show how all these get us one step closer to GitOps.