Getting to know Service Mesh
In the previous section, we read about monolithic architecture, its advantages, and disadvantages. We also read about how microservices solve the problem of scalability and provide flexibility to rapidly deploy and push software changes to production. The cloud makes it easier for an organization to focus on innovation without worrying about expensive and lengthy hardware procurement and expensive CapEx cost. The cloud also facilitates microservices architecture not only by facilitating on-demand infrastructure but also by providing various ready-to-use platforms and building blocks, such as PaaS and SaaS. When organizations are building applications, they don’t need to reinvent the wheel every time; instead, they can leverage ready-to-use databases, various platforms including Kubernetes, and Middleware as a Service (MWaaS).
In addition to the cloud, microservice developers also leverage containers, which makes microservices development much easier by providing a consistent environment and compartmentalization to help achieve modular and self-contained architecture of microservices. On top of containers, the developer should also use a container orchestration platform such as Kubernetes, which simplifies the management of containers and takes care of concerns such as networking, resource allocation, scalability, reliability, and resilience. Kubernetes also helps to optimize the infrastructure cost by providing better utilization of underlying hardware. When you combine the cloud, Kubernetes, and microservices architecture, you have all the ingredients you need to deliver potent software applications that not only do the job you want them to do but also do it cost-effectively.
So, the question on your mind must be, “Why do I need a Service Mesh?” or “Why do I need Service Mesh if I am using the cloud, Kubernetes, and microservices?” It is a great question to ask and think about, and the answer becomes evident once you have reached a stage where you are confidently deploying microservices on Kubernetes, and then you reach a certain tipping point where networking between microservices just becomes too complex to address by using Kubernetes’ native features.
Fallacies of distributed computing
Fallacies of a distributed system are a set of eight assertions made by L Peter Deutsch and others at Sun Microsystems. These assertions are false assumptions often made by software developers when designing distributed applications. The assumptions are that a network is reliable, latency is zero, bandwidth is infinite, the network is secure, the topology doesn’t change, there is one administrator, the transport cost is zero, and the network is homogenous.
At the beginning of the Understanding Kubernetes section, we looked at the challenges developers face when implementing microservices architecture. Kubernetes provides various features for the deployment of containerized microservices as well as container/Pod life cycle management through declarative configuration, but it falls short of solving communication challenges between microservices. When talking about the challenges of microservices, we used terms such as application networking to describe communication challenges. So, let’s try to first understand what application networking is and why it is so important for the successful operations of microservices.
Application networking is also a loosely used term; there are various interpretations of it depending on the context it is being used in. In the context of microservices, we refer to application networking as the enabler of distributed communication between microservices. The microservice can be deployed in one Kubernetes cluster or multiple clusters over any kind of underlying infrastructure. A microservice can also be deployed in a non-Kubernetes environment in the cloud, on-premises, or both. For now, let’s keep our focus on Kubernetes and application networking within Kubernetes.
Irrespective of where microservices are deployed, you need a robust application network in place for microservices to talk to each other. The underlying platform should not just facilitate communication but also resilient communication. By resilient communication, we mean the kind of communication where it has a large probability of being successful even when the ecosystem around it is in adverse conditions.
Apart from the application network, you also need visibility of the communication happening between microservices; this is also called observability. Observability is important in microservices communication in knowing how the microservices are interacting with each other. It is also important that microservices communicate securely with each other. The communication should be encrypted and defended against man-in-the-middle attacks. Every microservice should have an identity and be able to prove that they are authorized to communicate with other microservices.
So, why Service Meshes? Why can’t these requirements be addressed in Kubernetes? The answer lies in Kubernetes architecture and what it was designed to do. As mentioned before, Kubernetes is application life cycle management software. It provides application networking, observability, and security, but at a very basic level that is not sufficient to meet the requirements of modern and dynamic microservices architecture. This doesn’t mean that Kubernetes is not modern software. Indeed, it is a very sophisticated and cutting-edge technology, but only for serving container orchestration.
Traffic management in Kubernetes is handled by the Kubernetes network proxy, also called kube-proxy. kube-proxy runs on each node in the Kubernetes cluster. kube-proxy communicates with the Kubernetes API server and gets information about Kubernetes services. Kubernetes services are another level of abstraction to expose a set of Pods as a network service. kube-proxy implements a form of virtual IP for services that sets iptables rules, defining how any traffic for that service will be routed to the endpoints, which are essentially the underlying Pods hosting the application.
To understand it better, let’s look at the following example. To run this example, you will need minikube and kubectl on your computing device. If you don’t have this software installed, then I suggest you hold off from installing it, as we will be going through the installation steps in Chapter 2.
We will create a Kubernetes deployment and service by following the example in https://minikube.sigs.k8s.io/docs/start/:
$ kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.4 deployment.apps/hello-minikube created
We just created a deployment object named hello-minikube
. Let’s execute the kubectl
describe
command:
$ kubectl describe deployment/hello-minikube Name: hello-minikube ……. Selector: app=hello-minikube ……. Pod Template: Labels: app=hello-minikube Containers: echoserver: Image: k8s.gcr.io/echoserver:1.4 ..
From the preceding code block, you can see that a Pod has been created, containing a container instantiated from the k8s.gcr.io/echoserver:1.4
image. Let’s now check the Pods:
$ kubectl get po hello-minikube-6ddfcc9757-lq66b 1/1 Running 0 7m45s
The preceding output confirms that a Pod has been created. Now, let’s create a service and expose it so that it is accessible on a cluster-internal IP on a static port, also called NodePort
:
$ kubectl expose deployment hello-minikube --type=NodePort --port=8080 service/hello-minikube exposed
Let’s describe the service:
$ kubectl describe services/hello-minikube Name: hello-minikube Namespace: default Labels: app=hello-minikube Annotations: <none> Selector: app=hello-minikube Type: NodePort IP: 10.97.95.146 Port: <unset> 8080/TCP TargetPort: 8080/TCP NodePort: <unset> 31286/TCP Endpoints: 172.17.0.5:8080 Session Affinity: None External Traffic Policy: Cluster
From the preceding output, you can see that a Kubernetes service named hello-minikube
has been created and is accessible on port 31286
, also called NodePort
. We also see that there is an Endpoints
object with the 172.17.0.5:8080
value. Soon, we will see the connection between NodePort
and Endpoints
.
Let’s dig deeper and look at what is happening to iptables. If you would like to see what the preceding service returns, then you can simply type minikube service
. We are using macOS, where minikube is running itself as a VM. We will need to use ssh
on minikube to see what’s happening with iptables. On Unix host machines, the following steps are not required:
$ minikube ssh
Let’s check the iptables:
$ sudo iptables -L KUBE-NODEPORTS -t nat Chain KUBE-NODEPORTS (1 references) target prot opt source destination KUBE-MARK-MASQ tcp -- anywhere anywhere /* default/hello-minikube */ tcp dpt:31286 KUBE-SVC-MFJHED5Y2WHWJ6HX tcp -- anywhere anywhere /* default/hello-minikube */ tcp dpt:31286
We can see that there are two iptables rules associated with the hello-minikube
service. Let’s look further into these iptables rules:
$ sudo iptables -L KUBE-MARK-MASQ -t nat Chain KUBE-MARK-MASQ (23 references) target prot opt source destination MARK all -- anywhere anywhere MARK or 0x4000 $ sudo iptables -L KUBE-SVC-MFJHED5Y2WHWJ6HX -t nat Chain KUBE-SVC-MFJHED5Y2WHWJ6HX (2 references) target prot opt source destination KUBE-SEP-EVPNTXRIBDBX2HJK all -- anywhere anywhere /* default/hello-minikube */
The first rule, KUBE-MARK-MASQ
, is simply adding an attribute called packet mark
, with a 0x400
value for all traffic destined for port 31286
.
The second rule, KUBE-SVC-MFJHED5Y2WHWJ6HX
, is routing the traffic to another rule, KUBE-SEP-EVPNTXRIBDBX2HJK
. Let’s look further into it:
$ sudo iptables -L KUBE-SEP-EVPNTXRIBDBX2HJK -t nat Chain KUBE-SEP-EVPNTXRIBDBX2HJK (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 172.17.0.5 anywhere /* default/hello-minikube */ DNAT tcp -- anywhere anywhere /* default/hello-minikube */ tcp to:172.17.0.5:8080
Note that this rule has a destination network address translation (DNAT) to 172.17.0.5:8080
, which is the address of the endpoints when we created the service.
Let’s scale the number of Pod replicas:
$ kubectl scale deployment/hello-minikube --replicas=2 deployment.apps/hello-minikube scaled
Describe the service to find any changes:
$ kubectl describe services/hello-minikube Name: hello-minikube Namespace: default Labels: app=hello-minikube Annotations: <none> Selector: app=hello-minikube Type: NodePort IP: 10.97.95.146 Port: <unset> 8080/TCP TargetPort: 8080/TCP NodePort: <unset> 31286/TCP Endpoints: 172.17.0.5:8080,172.17.0.7:8080 Session Affinity: None External Traffic Policy: Cluster
Note that the value of the endpoint has changed; let’s also describe the hello-minikube
endpoint:
$ kubectl describe endpoints/hello-minikube Name: hello-minikube … Subsets: Addresses: 172.17.0.5,172.17.0.7 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset> 8080 TCP
Note that the endpoint is now also targeting 172.17.0.7
along with 172.17.0.5. 172.17.0.7
, the new Pod that has been created as a result of increasing the number of replicas to 2
.
Figure 1.5 – Service, endpoints, and Pods
Let’s check the iptables rules now:
$ sudo iptables -t nat -L KUBE-SVC-MFJHED5Y2WHWJ6HX Chain KUBE-SVC-MFJHED5Y2WHWJ6HX (2 references) target prot opt source destination KUBE-SEP-EVPNTXRIBDBX2HJK all -- anywhere anywhere /* default/hello-minikube */ statistic mode random probability 0.50000000000 KUBE-SEP-NXPGMUBGGTRFLABG all -- anywhere anywhere /* default/hello-minikube */
You will find that an additional rule, KUBE-SEP-NXPGMUBGGTRFLABG
, has been added, and because of the statistic mode random probability, 0.5
, each packet handled by KUBE-SVC-MFJHED5Y2WHWJ6HX
is then distributed 50–50 between KUBE-SEP-EVPNTXRIBDBX2HJK
and KUBE-SEP-NXPGMUBGGTRFLABG
.
Let’s also quickly examine the new chain added after we changed the number of replicas to 2
:
$ sudo iptables -t nat -L KUBE-SEP-NXPGMUBGGTRFLABG Chain KUBE-SEP-NXPGMUBGGTRFLABG (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 172.17.0.7 anywhere /* default/hello-minikube */ DNAT tcp -- anywhere anywhere /* default/hello-minikube */ tcp to:172.17.0.7:8080
Note that another DNAT
entry has been added for 172.17.0.7
. So, essentially, the new chain and the previous one are now routing traffic to corresponding Pods.
So, if we summarize everything, kube-proxy runs on every Kubernetes node and keeps a watch on service and endpoint resources. Based on service and endpoint configurations, kube-proxy then creates iptables rules to take care of routing data packets between the consumer/client and the Pod.
The following diagram depicts the creation of iptables rules via kube-proxy and how consumers connect with Pods.
Figure 1.6 – The client connecting to a Pod based on the iptables rule chain
kube-proxy can also run in another mode called IP Virtual Server (IPVS). For ease of reference, here’s how this term is defined on the official Kubernetes website:
Tip
To find out the mode in which kube-proxy is running, you can use $ curl localhost:10249/proxyMode
. On Linux, you can curl directly, but, on macOS, you need to curl from the minikube VM itself.
So, what is wrong with kube-proxy using iptables or IPVS?
kube-proxy doesn’t provide any fine-grained configuration; all settings are applied to all traffic on that node. kube-proxy can only do simple TCP, UDP, and SCTP stream forwarding or round-robin TCP, UDP, and SCTP forwarding across a set of backends. As the number of Kubernetes services grows, so does the number of rulesets in iptables. As the iptables rules are processed sequentially, it causes performance degradation with growth in microservice numbers. Also, iptables only supports the use of simple probability to support traffic distribution, which is very rudimentary. Kubernetes delivers a few other tricks but not enough to cater to resilient communication between microservices. For microservice communication to be resilient, you need more than iptables-based traffic management.
Let’s now talk about a couple of capabilities required to have resilient, fault-tolerant communication.
Retry mechanism, circuit breaking, timeouts, and deadlines
If one Pod is not functioning, then the traffic should automatically be sent to another Pod. Also, a retry needs to be done under constraints so as to not make the communication worse. For example, if a call fails, then maybe the system needs to wait before retrying. If a retry is not successful, then maybe it’s better to increase the wait time. Even then. If it is not successful, maybe it’s worth abandoning retry attempts and breaking the circuit for subsequent connection.
Circuit breaking is a mechanism that usually involves an electric circuit breaker. When there is a fault in a system where it is not safe to operate, the electric circuit breaker automatically trips. Similarly, consider microservices communications where one service is calling another service and the called service is not responding, is responding so slowly that it is detrimental to the calling service, or the occurrence of this behavior has reached a predefined threshold. In such a case, it is better to trip (stop) the circuit (communication) so that when the calling service (downstream) calls the underlying service (upstream), the communication fails straight away. The reason it makes sense to stop the downstream system from calling the upstream system is to stop resources such as network bandwidth, thread, IO, CPU, and memory from being wasted on an activity that has a significantly high probability of failing. Circuit breaking doesn’t resolve the communication problem; instead, it stops it from jumping boundaries and impacting other systems. Timeouts are also important during microservices communication so that downstream services wait for a response from the upstream system for a duration in which the response would be valid or worth waiting for. Deadlines build further on timeouts; you can see them as timeouts for the whole request, not just one connection. By specifying a deadline, a downstream system tells the upstream system about the overall maximum time permissible for processing the request, including subsequent calls to other upstream microservices involved in processing the request.
Important note
In a microservices architecture, downstream systems are the ones that rely on the upstream system. If service A calls service B, then service A will be called downstream and service B will be called upstream. When drawing a north–south architecture diagram to show a data flow between A and B, you will usually draw A at the top with an arrow pointing down toward B, which makes it confusing to call A downstream and B upstream. To make it easy to remember, you can draw the analogy that a downstream system depends on an upstream system. This way, microservice A depends on microservice B; hence, A is downstream and B is upstream.
Blue/green and canary deployments
Blue/green deployments are scenarios where you would like to deploy a new (green) version of a service side by side with the previous/existing (blue) version of a service. You make stability checks to ensure that the green environment can handle live traffic, and if it can, then you transfer the traffic from a blue to a green environment.
Blue and green can be different versions of a service in a cluster or services in an independent cluster. If something goes wrong with the green environment, you can switch the traffic back to the blue environment. Transfer of traffic from blue to green can also happen gradually (canary deployment) in various ways – for example, at a certain rate, such as 90:10 in the first 10 minutes, 70:30 in the next 10 minutes, 50:50 in the next 20 minutes, and 0:100 after that. Another example can be to apply the previous example to certain traffic, such as transferring the traffic at a previous rate with all traffic with a certain HTTP header value – that is, a certain class of traffic. While in blue/green deployment you deploy like-for-like deployments side by side, in canary deployment you can deploy a subset of what you deploy in green deployment. These features are difficult to achieve in Kubernetes due to it not supporting the fine-grained distribution of traffic.
The following diagram depicts blue/green and canary deployments.
Figure 1.7 – Blue/green deployment
To handle concerns such as blue/green and canary deployments, we need something that can handle the traffic at layer 7 rather than layer 4. There are frameworks such as Netflix Open Source Software (OSS) and a few others to solve distributed system communication challenges, but in doing so, they shift the responsibility of solving application networking challenges to microservice developers. Solving these concerns in application code is not only expensive and time-consuming but also not conducive to the overall outcome, which is to deliver business outcomes. Frameworks and libraries such as Netflix OSS are written in certain programming languages, which then constrain developers to use only compatible languages for building microservices. These constrain developers to use technologies and programming languages supported by a specific framework, going against the polyglot concept.
What is needed is a kind of proxy that can work alongside an application without requiring the application to have any knowledge of the proxy itself. The proxy should not just proxy the communication but also have intricate knowledge of the services doing the communication, along with the context of the communication. The application/service can then focus on business logic and let the proxy handle all concerns related to communication with other services. ss
is one such proxy working at layer 7, designed to run alongside microservices. When it does so, it forms a transparent communication mesh with other Envoy proxies running alongside respective microservices. The microservice communicates only with nvoy as localhost, and Envoy takes care of the communication with the rest of the mesh. In this communication model, the microservices don’t need to know about the network. Envoy is extensible because it has a pluggable filter chain mechanism for network layers 3, 4, and 7, allowing new filters to be added as needed to perform various functions, such as TLS client certificate authentication and rate limiting.
So, how are Service Meshes related with Envoy? A service Mesh is an infrastructure responsible for application networking. The following diagram depicts the relationship between the Service Mesh control plane, the Kubernetes API server, the Service Mesh sidecar, and other containers in the Pod.
Figure 1.6 – Service Mesh sidecars, data, and the control plane
A Service Mesh provides a data plane, which is basically a collection of application-aware proxies such as Envoy that are then controlled by a set of components called the control plane. In a Kubernetes-based environment, the service proxies are inserted as a sidecar to Pods without needing any modification to existing containers within the Pod. A Service Mesh can be added to Kubernetes and traditional environments, such as virtual machines, as well. Once added to the runtime ecosystem, the Service Mesh takes care of the application networking concerns we discussed earlier, such as load balancing, timeouts, retries, canary and blue-green deployment, security, and observability.