Developing distributed applications
Monolith applications, as we saw in the previous section, are applications in which all functionalities run together. Most of these applications were created for specific hardware, operating systems, libraries, binary versions, and so on. To run these applications in production, you need a least one dedicated server with the right hardware, operating system, libraries, and so on, and developers require a similar node architecture and resources even just for fixing possible application issues. Adding to this, the pre-production environments for tasks such as certification and testing will multiply the number of servers significantly. Even if your enterprise had the budget for all these servers, any maintenance task as a result of any upgrade in any operating system-related component in production should always be replicated on all other environments. Automation helps in replicating changes between environments, but this is not easy. You have to replicate environments and maintain them. On the other hand, new node provisioning could have taken months in the old days (preparing the specifications for a new node, drawing up the budget, submitting it to your company’s approvals workflow, looking for a hardware provider, and so on). Virtualization helped system administrators provision new nodes for developers faster, and automation (provided by tools such as Chef, Puppet, and, my favorite, Ansible) allowed for the alignment of changes between all environments. Therefore, developers were able to obtain their development environments quickly and ensure they were using an aligned version of system resources, improving the process of application maintenance.
Virtualization also worked very well with the three-tier application architecture. It was easy to run application components for developers in need of a database server to connect to while coding new changes. The problem with virtualization comes from the concept of replicating a complete operating system with server application components when we only need the software part. A lot of hardware resources are consumed for the operating system alone, and restarting these nodes takes some time as they are a complete operating system running on top of a hypervisor, itself running on a physical server with its own operating system.
Anyhow, developers were hampered by outdated operating system releases and packages, making it difficult for them to enable the evolution of their applications. System administrators started to manage hundreds of virtual hosts and even with automation, they weren’t able to maintain operating systems and application life cycles in alignment. Provisioning virtual machines on cloud providers using their Infrastructure-as-a-Service (IaaS) platforms or using their Platform-as-a-Service (PaaS) environments and scripting the infrastructure using their APIs (IaC) helped but the problem wasn’t fully resolved due to the quickly growing number of applications and required changes. The application life cycle changed from one or two updates per year to dozens per day.
Developers started to use cloud-provided services and using scripts and applications quickly became more important than the infrastructure on which they were running, which today seems completely normal and logical. Faster network communications and distributed reliability made it easier to start deploying our applications anywhere, and data centers became smaller. We can say that developers started this movement and it became so popular that we finished decoupling application components from the underlying operating systems.
Software containers are the evolution of process isolation features that were learned throughout the development of computer history. Mainframe computers allowed us to share CPU time and memory resources many years ago. Chroot and jail environments were common ways of sharing operating system resources with users, who were able to use all the binaries and libraries prepared for them by system administrators in BSD operating systems. On Solaris systems, we had zones as resource containers, which acted as completely isolated virtual servers within a single operating system instance.
So, why don’t we just isolate processes instead of full operating systems? This is the main idea behind containers. Containers use kernel features to provide process isolation at the operating system level, and all processes run on the same host but are isolated from each other. So, every process has its own set of resources sharing the same host kernel.
Linux kernels have featured this design of process grouping since the late 2000s in the form of control groups (cgroups). This feature allows the Linux kernel to manage, restrict, and audit groups of processes.
Another very important Linux kernel feature that’s used with containers is kernel namespaces, which allow Linux to run processes wrapped with their process hierarchy, along with their own network interfaces, users, filesystem mounts, and inter-process communication. Using kernel namespaces and control groups, we can completely isolate a process within an operating system. It will run as if it were on its own, using its own operating system and limited CPU and memory (we can even limit its disk I/O).
The Linux Containers (LXC) project took this idea further and created the first working implementation of it. This project is still available, is still in progress, and was the key to what we now know as Docker containers. LXC introduced terms such as templates to describe the creation of encapsulated processes using kernel namespaces.
Docker containers took all these concepts and created Docker Inc., an open source project that made it easy to run software containers on our systems. Containers ushered in a great revolution, just as virtualization did more than 20 years ago.
Going back to microservices architecture, the ideal application decoupling would mean running defined and specific application functionalities as completely standalone and isolated processes. This led to the idea of running microservice applications’ components within containers, with minimum operating system overhead.