So far, we have defined microservices and how processes fit in this model. As we saw previously, containers are related to process isolation. We will define a container as a process with all its requirements isolated with kernel features. This package-like object will contain all the code and its dependencies, libraries, binaries, and settings that are required to run our process. With this definition, it is easy to understand why containers are so popular in microservices environments, but, of course, we can execute microservices without containers. On the contrary, we can run containers with a full application, with many processes that don't need to be isolated from each other inside this package-like object.
In terms of multi-process containers, what is the difference between a virtual machine and containers? Let's review container features against virtual machines.
Containers are mainly based on cgroups and kernel namespaces.
Virtual machines, on the other hand, are based on hypervisor software. This software, which can run as part of the operating system in many cases, will provide sandboxed resources to the guest virtualized hardware that runs a virtual machine operating system. This means that each virtual machine will run its own operating system and allow us to execute different operating systems on the same hardware host. When virtual machines arrived, people started to use them as sandboxed environments for testing, but as hypervisors gained in maturity, data centers started to have virtual machines in production, and now this is common and standard practice in cloud providers (cloud providers currently offer hardware as a service, too).
In this schema, we're showing the different logic layers, beginning with the machine hardware. We will have many layers for executing a process inside virtual machines. Each virtual machine will have its own operating system and services, even if we are just running a single process:
Each virtual machine will get a portion of resources and guest operating systems, and the kernel will manage how they are shared among different running processes. Each virtual machine will execute its own kernel and the operating system running on top of those of the host. There is complete isolation between the guest operating systems because hypervisor software will keep them separated. On the other hand, there is an overhead associated with running multiple operating systems side by side and when microservices come to mind, this solution wastes numerous host resources. Just running the operating system will consume a lot of resources. Even the fastest hardware nodes with fast SSD disks require resources and time to start and stop virtual machines. As we have seen, microservices are just a process with complete functionality inside an application, so running the entire operating system for just a couple of processes doesn't seem like a good idea.
On each guest host, we need to configure everything needed for our microservice. This means access, users, configurations, networking, and more. In fact, we need administrators for these systems as if they were bare-metal nodes. This requires a significant amount of effort and is the reason why configuration management tools are so popular these days. Ansible, Puppet, Chef, and SaltStack, among others, help us to homogenize our environments. However, remember that developers need their own environments, too, so multiply these resources by all the required environments in the development pipeline.
How can we scale up on service peaks? Well, we have virtual machine templates and, currently, almost all hypervisors allow us to interact with them using the command line or their own administrative API implementations, so it is easy to copy or clone a node for scaling application components. But this will require double the resources – remember that we will run another complete operating system with its own resources, filesystems, network, and so on. Virtual machines are not the perfect solution for elastic services (which can scale up and down, run everywhere, and are created on-demand in many cases).
Containers will share the same kernel because they are just isolated processes. We will just add a templated filesystem and resources (CPU, memory, disk I/O, network, and so on, and, in some cases, host devices) to a process. It will run sandboxed inside and will only use its defined environment. As a result, containers are lightweight and start and stop as fast as their main processes. In fact, containers are as lightweight as the processes they run, since we don't have anything else running inside a container. All the resources that are consumed by a container are process-related. This is great in terms of hardware resource allocation. We can find out the real consumption of our application by observing the load of all of its microservices.
Similar to virtual machines, there is the concept of a template for container creation called Image. Docker images are standard for many container runtimes. They ensure that all containers that are created from a container image will run with the same properties and features. In other words, this eliminates the it works on my computer! problem.
Docker containers improve security in our environments because they are secure by default. Kernel isolation and the kind of resources managed inside containers provide a secure environment during execution. There are many ways to improve this security further, as we will see in the following chapters. By default, containers will run with a limited set of system calls allowed.
This schema describes the main differences between running processes on different virtual machines and using containers:
Containers are faster to deploy and manage, lightweight, and secure by default. Because of their speed upon execution, containers are aligned with the concept of resilience. And because of the package-like environment, we can run containers everywhere. We only need a container runtime to execute deployments on any cloud provider, as we do on our data centers. The same concept will be applied to all development stages, so integration and performance tests can be run with confidence. If the previous tests were passed, since we are using the same artifact across all stages, we can ensure its execution in production.
In the following chapters, we will dive deep into Docker container components. For now, however, just think of a Docker container as a sandboxed process that runs in our system, isolated from all other running processes on the same host, based on a template named Docker Image.