The need for containers
Containers are in vogue lately and for excellent reason. They solve the computer architecture’s most critical problem – running reliable, distributed software with near-infinite scalability in any computing environment.
They have enabled an entirely new discipline in software engineering – microservices. They have also introduced the package once deploy anywhere concept in technology. Combined with the cloud and distributed applications, containers with container orchestration technology have led to a new buzzword in the industry – cloud-native – changing the IT ecosystem like never before.
Before we delve into more technical details, let’s understand containers in plain and simple words.
Containers derive their name from shipping containers. I will explain containers using a shipping container analogy for better understanding. Historically, because of transportation improvements, a lot of stuff moved across multiple geographies. With various goods being transported in different modes, loading and unloading goods was a massive issue at every transportation point. In addition, with rising labor costs, it was impractical for shipping companies to operate at scale while keeping prices low.
Also, it resulted in frequent damage to items, and goods used to get misplaced or mixed up with other consignments because there was no isolation. There was a need for a standard way of transporting goods that provided the necessary isolation between consignments and allowed for easy loading and unloading of goods. The shipping industry came up with shipping containers as an elegant solution to this problem.
Now, shipping containers have simplified a lot of things in the shipping industry. With a standard container, we can ship goods from one place to another by only moving the container. The same container can be used on roads, loaded on trains, and transported via ships. The operators of these vehicles don’t need to worry about what is inside the container most of the time. The following figure depicts the entire workflow graphically for ease of understanding:
Figure 1.2 – Shipping container workflow
Similarly, there have been issues with software portability and compute resource management in the software industry. In a standard software development life cycle, a piece of software moves through multiple environments, and sometimes, numerous applications share the same OS. There may be differences in the configuration between environments, so software that may have worked in a development environment may not work in a test environment. Something that worked in test may also not work in production.
Also, when you have multiple applications running within a single machine, there is no isolation between them. One application can drain compute resources from another application, and that may lead to runtime issues.
Repackaging and reconfiguring applications is required in every step of deployment, so it takes a lot of time and effort and is sometimes error-prone.
In the software industry, containers solve these problems by providing isolation between application and compute resource management, which provides an optimal solution to these issues.
The software industry’s biggest challenge is to provide application isolation and manage external dependencies elegantly so that they can run on any platform, irrespective of the OS or the infrastructure. Software is written in numerous programming languages and uses various dependencies and frameworks. This leads to a scenario called the matrix of hell.
The matrix of hell
Let’s say you’re preparing a server that will run multiple applications for multiple teams. Now, assume that you don’t have a virtualized infrastructure and that you need to run everything on one physical machine, as shown in the following diagram:
Figure 1.3 – Applications on a physical server
One application uses one particular version of a dependency, while another application uses a different one, and you end up managing two versions of the same software in one system. When you scale your system to fit multiple applications, you will be managing hundreds of dependencies and various versions that cater to different applications. It will slowly turn out to be unmanageable within one physical system. This scenario is known as the matrix of hell in popular computing nomenclature.
Multiple solutions come out of the matrix of hell, but there are two notable technological contributions – virtual machines and containers.
Virtual machines
A virtual machine emulates an OS using a technology called a hypervisor. A hypervisor can run as software on a physical host OS or run as firmware on a bare-metal machine. Virtual machines run as a virtual guest OS on the hypervisor. With this technology, you can subdivide a sizeable physical machine into multiple smaller virtual machines, each catering to a particular application. This has revolutionized computing infrastructure for almost two decades and is still in use today. Some of the most popular hypervisors on the market are VMware and Oracle VirtualBox.
The following diagram shows the same stack on virtual machines. You can see that each application now contains a dedicated guest OS, each of which has its own libraries and dependencies:
Figure 1.4 – Applications on virtual machines
Though the approach is acceptable, it is like using an entire ship for your goods rather than a simple container from the shipping container analogy. Virtual machines are heavy on resources as you need a heavy guest OS layer to isolate applications rather than something more lightweight. We need to allocate dedicated CPU and memory to a virtual machine; resource sharing is suboptimal since people tend to overprovision virtual machines to cater to peak load. They are also slower to start, and virtual machine scaling is traditionally more cumbersome as multiple moving parts and technologies are involved. Therefore, automating horizontal scaling (handling more traffic from users by adding more machines to the resource pool) using virtual machines is not very straightforward. Also, sysadmins now have to deal with multiple servers rather than numerous libraries and dependencies in one. It is better than before, but it is not optimal from a compute resource point of view.
Containers
This is where containers come into the picture. Containers solve the matrix of hell without involving a heavy guest OS layer between them. Instead, they isolate the application runtime and dependencies by encapsulating them to create an abstraction called containers. Now, you have multiple containers that run on a single OS. Numerous applications running on containers can share the same infrastructure. As a result, they do not waste your computing resources. You also do not have to worry about application libraries and dependencies as they are isolated from other applications – a win-win situation for everyone!
Containers run on container runtimes. While Docker is the most popular and more or less the de facto container runtime, other options are available on the market, such as Rkt and Containerd. They all use the same Linux kernel cgroups feature, whose basis comes from the combined efforts of Google, IBM, OpenVZ, and SGI to embed OpenVZ into the main Linux kernel. OpenVZ was an early attempt at implementing features to provide virtual environments within a Linux kernel without using a guest OS layer, which we now call containers.
It works on my machine
You might have heard this phrase many times in your career. It is a typical situation where you have erratic developers worrying your test team with “But, it works on my machine” answers and your testing team responding with “We are not going to deliver your machine to the client.” Containers use the Build once, run anywhere and the Package once, deploy anywhere concepts and solve the It works on my machine syndrome. As containers need a container runtime, they can run on any machine in the same way. A standardized setup for applications also means that the sysadmin’s job is reduced to just taking care of the container runtime and servers and delegating the application’s responsibilities to the development team. This reduces the admin overhead from software delivery, and software development teams can now spearhead development without many external dependencies – a great power indeed! Now, let’s look at how containers are designed to do that.