Packt+ | Advance your knowledge in tech

You're reading from Docker Certified Associate (DCA): Exam Guide Enhance and validate your Docker skills by gaining Docker certification

Product type Paperback

Published in Sep 2020

Publisher Packt

ISBN-13 9781839211898

Length 612 pages

Edition 1st Edition

Tools

Docker

Concepts

IT Certification

Author (1):

Francisco Javier Ramírez Urea

View More author details

Learning about the main concepts of containers

When talking about containers, we need to understand the main concepts behind the scenes. Let's decouple the container concept into different pieces and try to understand each one in turn.

Container runtime

The runtime for running containers will be the software and operating system features that make process execution and isolation possible.

Docker, Inc. provides a container runtime named Docker, based on open source projects sponsored by them and other well-known enterprises that empower container movement (Red Hat/IBM and Google, among many others). This container runtime comes packaged with other components and tools. We will analyze each one in detail in the Docker components section.

Images

We use images as templates for creating containers. Images will contain everything required by our process or processes to run correctly. These components can be binaries, libraries, configuration files, and so on that can be a part of operating system files or just components built by yourself for this application.

Images, like templates, are immutable. This means that they don't change between executions. Every time we use an image, we will get the same results. We will only change configuration and environment to manage the behavior of different processes between environments. Developers will create their application component template and they can be sure that if the application passed all the tests, it will work in production as expected. These features ensure faster workflows and less time to market.

Docker images are built up from a series of layers, and all these layers packaged together contain everything required for running our application process. All these layers are read-only and the changes are stored in the next upper layer during image creation. This way, each layer only has a set of differences from the layer before it.

Layers are packaged to allow ease of transport between different systems or environments, and they include meta-information about the required architecture to run (will it run on Linux or Windows, or does it require an ARM processor, for example?). Images include information about how the process should be run, which user will execute the main process, where persistent data will be stored, what ports your process will expose in order to communicate with other components or users, and more.

Images can be built with reproducible methods using Dockerfiles or store changes made on running containers to obtain a new image:

This was a quick review of images. Now, let's take a look at containers.

Containers

As we described earlier, a container is a process with all its requirements that runs separately from all the other processes running on the same host. Now that we know what templates are, we can say that containers are created using images as templates. In fact, a container adds a new read-write layer on top of image layers in order to store filesystem differences from these layers. The following diagram represents the different layers involved in container execution. As we can observe, the top layer is what we really call the container because it is read-write and allows changes to be stored on the host disk:

All image layers are read-only layers, which means all the changes are stored in the container's read-write layer. This means that all these changes will be lost when we remove a container from a host, but the image will remain until we remove it. Images are immutable and always remain unchanged.

This container behavior lets us run many containers using the same underlying image, and each one will store changes on its own read-write layer. The following diagram represents how different images will use the same image layers. All three containers are based on the same image:

There are different approaches to managing image layers when building and container layers on execution. Docker uses storage drivers to manage this content, on read-only layers and read-write ones. These drivers are operating system-dependent, but they all implement what is known as copy-on-write filesystems.

A storage driver (known as graph-driver) will manage how Docker will store and manage the interactions between layers. As we mentioned previously, there are different drivers integrations available, and Docker will choose the best one for your system, depending on your host's kernel and operating system. Overlay2 is the most common and preferred driver for Linux operating systems. Others, such as aufs, overlay, and btfs, among others, are also available, but keep in mind that overlay2 is recommended for production environments on modern operating systems.

Devicemapper is also a supported graph driver and it was very common on Red Hat environments before overlay2 was supported on modern operating system releases (Red Hat 7.6 and above). Devicemapper uses block devices for storing layers and can be deployed in observance of two different strategies: loopback-lvm (by default and only for testing purposes) and direct-lvm (requires additional block device pool configurations and is intended for production environments). This link provides the required steps for deploying: direct-lvm: https://docs.docker.com/storage/storagedriver/device-mapper-driver/

As you may have noticed, using copy-on-write filesystems will make containers very small in terms of disk space usage. All common files are shared between the same image-based containers. They just store differences from immutable files that are part of image layers. Consequently, container layers will be very small (of course, this depends on what you are storing on containers, but keep in mind that good containers are small). When an existing file in a container has to be modified (remember a file that comes from underlying layers), the storage driver will perform a copy operation to the container layer. This process is fast, but keep in mind that everything that is going to be changed on containers will follow this process. As a reference, don't use copy-on-write with heavy I/O operations, nor process logs.

Copy-on-write is a strategy for creating maximum efficiency and small layer-based filesystems. This storage strategy works by copying files between layers. When a layer needs to change a file from another underlaying layer, it will be copied to this top one. If it just needs read access, it will use it from underlying layers. This way, I/O access is minimized and the size of the layers is very small.

A common question that many people ask is whether containers are ephemeral. The short answer is no. In fact, containers are not ephemeral for a host. This means that when we create or run a container on that host, it will remain there until someone removes it. We can start a stopped container on the same host if it is not deleted yet. Everything that was inside this container before will be there, but it is not a good place to store process state because it is only local to that host. If we want to be able to run containers everywhere and use orchestration tools to manage their states, processes must use external resources to store their status.

As we'll see in later chapters, Swarm or Kubernetes will manage service or application component status and, if a required container fails, it will create a new container. Orchestration will create a new container instead of reusing the old one because, in many cases, this new process will be executed elsewhere in the clustered pool of hosts. So, it is important to understand that your application components that will run as containers must be logically ephemeral and that their status should be managed outside containers (database, external filesystem, inform other services, and so on).

The same concept will be applied in terms of networking. Usually, you will let a container runtime or orchestrator manage container IP addresses for simplicity and dynamism. Unless strictly necessary, don't use fixed IP addresses, and let internal IPAMs configure them for you.

Networking in containers is based on host bridge interfaces and firewall-level NAT rules. A Docker container runtime will manage the creation of virtual interfaces for containers and process isolation between different logical networks creating mentioned rules. We will see all the network options provided and their use cases in Chapter 4, Container Persistency and Networking. In addition, publishing an application is managed by the runtime and orchestration will add different properties and many other options.

Using volumes will let us manage the interaction between the process and the container filesystem. Volumes will bypass the copy-on-write filesystem and hence writing will be much faster. In addition to this, data stored in a volume will not follow the container life cycle. This means that even if we delete the container that was using that volume, all the data that was stored there will remain until someone deletes it. We can define a volume as the mechanism we will use to persist data between containers. We will learn that volumes are an easy way to share data between containers and deploy applications that need to persist their data during the life of the application (for example, databases or static content). Using volumes will not increase container layer size, but using them locally will require additional host disk resources under the Docker filesystem/directory tree.

Process isolation

As we mentioned previously, a kernel provides namespaces for process isolation. Let's review what each namespace provides. Each container runs with its own kernel namespaces for the following:

Processes: The main process will be the parent of all other ones within the container.
Network: Each container will get its own network stack with its own interfaces and IP addresses and will use host interfaces.
Users: We will be able to map container user IDs with different host user IDs.
IPC: Each container will have its own shared memory, semaphores, and message queues without conflicting other processes on the host.
Mounts: Each container will have its own root filesystem and we can provide external mounts, which we will learn about in upcoming chapters.
UTS: Each container will get its own hostname and time will be synced with the host.

The following diagram represents a process tree from the host perspective and inside a container. Processes inside a container are namespaced and, as a result, their parent PID will be the main process, with its own PID of 1:

Namespaces have been available in Linux since version 2.6.26 (July 2008), and they provide the first level of isolation for a process running within a container so that it won't see others. This means they cannot affect other processes running on the host or in any other container. The maturity level of these kernel features allows us to trust in Docker namespace isolation implementation.

Networking is isolated too, as each container gets its own network stack, but communications will pass through host bridge interfaces. Every time we create a Docker network for containers, we will create a new network bridge, which we will learn more about in Chapter 4, Container Persistency and Networking. This means that containers sharing a network, which is a host bridge interface, will see one another, but all other containers running on a different interface will not have access to them. Orchestration will add different approaches to container runtime networking but, at the host level, described rules are applied.

Host resources available to a container are managed by control groups. This isolation will not allow a container to bring down a host by exhausting its resources. You should not allow containers with non-limited resources in production. This must be mandatory in multi-tenant environments.

Orchestration

This book contains a general chapter about orchestration, Chapter 7, Introduction to Orchestration, and two specific chapters devoted to Swarm and Kubernetes, respectively, Chapter 8, Orchestration Using Docker Swarm, and Chapter 9, Orchestration Using Kubernetes. Orchestration is the mechanism that will manage container interactions, publishing, and health in clustered pools of hosts. It will allow us to deploy an application based on many components or containers and keep it healthy during its entire life cycle. With orchestration, component updates are easy because it will take care of the required changes in the platform to accomplish a new, appropriate state.

Deploying an application using orchestration will require a number of instances for our process or processes, the expected state, and instructions for managing its life during execution. Orchestration will provide new objects, communication between containers running on different hosts, features for running containers on specific nodes within the cluster, and the mechanisms to keep the required number of process replicas alive with the desired release version.

Swarm is included inside Docker binaries and comes as standard. It is easy to deploy and manage. Its unit of deployment is known as a service. In a Swarm environment, we don't deploy containers because containers are not managed by orchestration. Instead, we deploy services and those services will be represented by tasks, which will run containers to maintain its state.

Currently, Kubernetes is the most widely used form of orchestration. It requires extra deployment effort using a Docker community container runtime. It adds many features, multi-container objects known as pods that share a networking layer, and flat networking for all orchestrated pods, among other things. Kubernetes is community-driven and evolves very fast. One of the features that makes this platform so popular is the availability to create your own kind of resources, allowing us to develop new extensions when they are not available.

We will analyze the features of pods and Kubernetes in detail in Chapter 9, Orchestration Using Kubernetes.

Docker Enterprise provides orchestrators deployed under Universal Control Plane with high availability on all components.

Registry

We have already learned that containers execute processes within an isolated environment, created from a template image. So, the only requirements for deploying that container on a new node will be the container runtime and the template used to create that container. This template can be shared between nodes using simple Docker command options. But this procedure can become more difficult as the number of nodes grows. To improve image distribution, we will use image registries, which are storage points for these kinds of objects. Each image will be stored in its own repository. This concept is similar to code repositories, allowing us to use tags to describe these images, aligning code releases with image versioning.

An application deployment pipeline has different environments, and having a common point of truth between them will help us to manage these objects through the different workflow stages.

Docker provides two different approaches for registry: the community version and Docker Trusted Registry. The community version does not provide any security at all, nor role-based access to image repositories. On the other hand, Docker Trusted Registry comes with the Docker Enterprise solution and is an enterprise-grade registry, with included security, image vulnerability scanning, integrated workflows, and role-based access. We will learn about Docker Enterprise's registry in Chapter 13, Implementing an Enterprise-Grade Registry with DTR.