Understanding Docker
Before we look at installing Docker, let's begin by getting an understanding of the problems that the Docker technology aims to solve.
Developers
The company behind Docker, also called Docker, has always described the program as fixing the 'it works on my machine' problem. This problem is best summed up by an image, based on the Disaster Girl meme, which simply had the tagline 'Worked fine in dev, ops problem now', that started popping up in presentations, forums, and Slack channels a few years ago. While it is funny, it is, unfortunately, an all-too-real problem and one I have personally been on the receiving end of, let's take a look at an example of what is meant by this.
The problem
Even in a world where DevOps best practices are followed, it is still all too easy for a developer's working environment to not match the final production environment.
For example, a developer using the macOS version of, say, PHP will probably not be running the same version as the Linux server that hosts the production code. Even if the versions match, you then have to deal with differences in the configuration and overall environment on which the version of PHP is running, such as differences in the way file permissions are handled between different operating system versions, to name just one potential problem.
All of this comes to a head when it is time for a developer to deploy their code to the host, and it doesn't work. So, should the production environment be configured to match the developer's machine, or should developers only do their work in environments that match those used in production?
In an ideal world, everything should be consistent, from the developer's laptop all the way through to your production servers; however, this utopia has traditionally been challenging to achieve. Everyone has their way of working and their own personal preferences—enforcing consistency across multiple platforms is difficult enough when a single engineer is working on the systems, let alone a team of engineers working with a team of potentially hundreds of developers.
The Docker solution
Using Docker for Mac or Docker for Windows, a developer can quickly wrap their code in a container that they have either defined themselves or created as a Dockerfile while working alongside a sysadmin or operations team. We will be covering this in Chapter 2, Building Container Images, as well as Docker Compose files, which we will go into more detail about in Chapter 5, Docker Compose.
Programmers can continue to use their chosen integrated development environment (IDE) and maintain their workflows when working with the code. As we will see in the upcoming sections of this chapter, installing and using Docker is not difficult; considering how much of a chore it was to maintain consistent environments in the past, even with automation, Docker feels a little too easy – almost like cheating.
Operators
I have been working in operations for more years than I would like to admit, and the following problem has cropped regularly.
The problem
Let's say you are looking after five servers: three load-balanced web servers and two database servers that are in a master or slave configuration dedicated to running Application 1. You are using a tool, such as Puppet or Chef, to automatically manage the software stack and configuration across your five servers.
Everything is going great until you are told that we need to deploy Application 2 on the same servers that are running Application 1. On the face of it, this is not a problem – you can tweak your Puppet or Chef configuration to add new users, add virtual hosts, pull the latest code down, and so on. However, you notice that Application 2 requires a newer version of the software than the one you are running for Application 1.
To make matters worse, you already know that Application 1 flat out refuses to work with the new software stack and that Application 2 is not backward compatible.
Traditionally, this leaves you with a few choices, all of which just add to the problem in one way or another:
- Ask for more servers? While this tradition is probably the safest technical solution, it does not automatically mean that there will be the budget for additional resources.
- Re-architect the solution? Taking one of the web and database servers out of the load balancer or replication and redeploying them with the software stack for Application 2 may seem like the next easiest option from a technical point of view. However, you are introducing single points of failure for Application 2 and reducing the redundancy for Application 1 as well: there was probably a reason why you were running three web and two database servers in the first place.
- Attempt to install the new software stack side-by-side on your servers? Well, this certainly is possible and may seem like a good short-term plan to get the project out of the door, but it could leave you with a house of cards that could come tumbling down when the first critical security patch is needed for either software stack.
The Docker solution
This is where Docker starts to come into its own. If you have Application 1 running across your three web servers in containers, you may be running more than three containers; in fact, you could already be running six, doubling up on the containers, allowing you to run rolling deployments of your application without reducing the availability of Application 1.
Deploying Application 2 in this environment is as easy as merely launching more containers across your three hosts and then routing to the newly deployed application using your load balancer. As you are just deploying containers, you do not need to worry about the logistics of deploying, configuring, and managing two versions of the same software stack on the same server.
We will work through an example of this exact scenario in Chapter 5, Docker Compose.
Enterprise
Enterprises suffer from the same problems faced by developers and operators, as they employ both types of profession; however, they have both of these entities on a much larger scale, and there is also a lot more risk involved.
The problem
Because of the risk as well as the fact that any downtime could cost sales or impact reputation, enterprises need to test every deployment before it is released. This means that new features and fixes are stuck in a holding pattern while the following takes place:
- Test environments are spun up and configured.
- Applications are deployed across the newly launched environments.
- Test plans are executed, and the application and configuration are tweaked until the tests pass.
- Requests for change are written, submitted, and discussed to get the updated application deployed to production.
This process can take anywhere from a few days to a few weeks, or even months, depending on the complexity of the application and the risk the change introduces. While the process is required to ensure continuity and availability for the enterprise at a technological level, it does potentially add risk at the business level. What if you have a new feature stuck in this holding pattern and a competitor releases a similar—or worse still—the same functionality, ahead of you?
This scenario could be just as damaging to sales and reputation as the downtime that the process was put in place to protect you against in the first place.
The Docker solution
Docker does not remove the need for a process, such as the one just described, to exist or be followed. However, as we have already touched upon, it does make things a lot easier as you are already working consistently. It means that your developers have been working with the same container configuration that is running in production. This means that it is not much of a step for the methodology to be applied to your testing.
For example, when a developer checks their code that they know works on their local development environment (as that is where they have been doing all of their work), your testing tool can launch the same containers to run your automated tests against. Once the containers have been used, they can be removed to free up resources for the next lot of tests. This means that suddenly, your testing process and procedures are a lot more flexible, and you can continue to reuse the same environment, rather than redeploying or re-imaging servers for the next set of testing.
This streamlining of the process can be taken as far as having your new application containers push through to production.
The quicker this process can be completed, the faster you can confidently launch new features or fixes and keep ahead of the curve.
So, we know what problems Docker was developed to solve. We now need to discuss what exactly Docker is and what it does.
The differences between dedicated hosts, virtual machines, and Docker
Docker is a container management system that helps us efficiently manage Linux Containers (LXC) more easily and universally. This lets you create images in virtual environments on your laptop and run commands against them. The actions you perform to the containers, running in these environments locally on your machine, will be the same commands or operations that you run against them when they are running in your production environment.
This helps us in that you don't have to do things differently when you go from a development environment, such as the one on your local machine, to a production environment on your server. Now, let's take a look at the differences between Docker containers and typical virtual machine environments:
As you can see, for a dedicated machine, we have three applications, all sharing the same orange software stack. Running virtual machines allows us to run three applications, running two completely different software stacks. The following diagram shows the same three applications running in containers using Docker:
This diagram gives us a lot of insight into the most significant key benefit of Docker, that is, there is no need for a complete operating system every time we need to bring up a new container, which cuts down on the overall size of containers. Since almost all the versions of Linux use the standard kernel models, Docker relies on using the host operating system's Linux kernel for the operating system it was built upon, such as Red Hat, CentOS, and Ubuntu.
For this reason, you can have almost any Linux operating system as your host operating system and be able to layer other Linux-based operating systems on top of the host. Well, that is, your applications are led to believe that a full operating system is actually installed—but in reality, we only install the binaries, such as a package manager and, for example, Apache/PHP and the libraries required to get just enough of an operating system for your applications to run.
For example, in the earlier diagram, we could have Red Hat running for the orange application, and Debian running for the green application, but there would never be a need actually to install Red Hat or Debian on the host. Thus, another benefit of Docker is the size of images when they are created. They are built without the most significant piece: the kernel or the operating system. This makes them incredibly small, compact, and easy to ship.