There are many topics related to container security. In this chapter, we will review the ones related to the container runtime.
As we have seen, Docker provides a client-server environment. From the client side, there are a few things that will improve the way in which we will be able to access the environment.
Configuration files and certificates for different clusters on hosts must be secured using filesystem security at the operating system level. However, as you should have noticed, a Docker client always needs a server in order to do anything with containers. Docker client is just the tool to connect to servers. With this picture in mind, client-server security is a must. Now, let's take a look at different kinds of access to the Docker daemon.
Docker client-server security
The Docker daemon will listen on system sockets (unix, tcp, and fd). We have seen that we can change this behavior and that, by default, the daemon will listen on the /var/run/docker.sock local Unix socket.
Giving users RW access to /var/run/docker.sock will add access to the local Docker daemon. This allows them to create images, run containers (even privileged, root user containers, and mount local filesystems inside them), create images, and more. It is very important to know who can use your Docker engine. If you deployed a Docker Swarm cluster, this is even worse because if the accessed host has a master role, the user will be able to create a service that will run containers across the entirety of the cluster. So keep your Docker daemon socket safe from non-trusted users and only allow authorized ones (in fact, we will look at other advanced mechanisms to provide secure user access to the container platform).
Docker daemon is secure by default because it does not export its service. We can enable remote TCP accesses by adding -H tcp://<HOST_IP> to the Docker daemon start process. By default, port 2375 will be used. If we use 0.0.0.0 as the host IP address, Docker daemon will listen on all interfaces.
We can enable remote access to Docker daemon using a TCP socket. By default, communication will not be secure and the daemon will listen on port 2375. To ensure that the client-to-daemon connection is encrypted, you will need to use either a reverse proxy or built-in TLS-based HTTPS encrypted socket. We can allow the daemon to listen on all host interface IP addresses or just one using this IP when starting the daemon. To use TLS-based communications, we need to follow this procedure (assuming your server hostname is in the $HOST variable):
- Create a certificate authority (CA). The following commands will create its private and public keys:
$ openssl genrsa -aes256 -out ca-key.pem 4096
Generating RSA private key, 4096 bit long modulus
............................................................................................................................................................................................++
........++
e is 65537 (0x10001)
Enter pass phrase for ca-key.pem:
Verifying - Enter pass phrase for ca-key.pem:
$ openssl req -new -x509 -days 365 -key ca-key.pem -sha256 -out ca.pem
Enter pass phrase for ca-key.pem:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:Queensland
Locality Name (eg, city) []:Brisbane
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Docker Inc
Organizational Unit Name (eg, section) []:Sales
Common Name (e.g. server FQDN or YOUR name) []:$HOST
Email Address []:Sven@home.org.au
- Create a server CA-signed key, ensuring that the common name matches the hostname you use to connect to Docker daemon from the client:
$ openssl genrsa -out server-key.pem 4096
Generating RSA private key, 4096 bit long modulus
.....................................................................++
.................................................................................................++
e is 65537 (0x10001)
$ openssl req -subj "/CN=$HOST" -sha256 -new -key server-key.pem -out server.csr
$ echo subjectAltName = DNS:$HOST,IP:10.10.10.20,IP:127.0.0.1 >> extfile.cnf
$ echo extendedKeyUsage = serverAuth >> extfile.cnf
$ openssl x509 -req -days 365 -sha256 -in server.csr -CA ca.pem -CAkey ca-key.pem \
-CAcreateserial -out server-cert.pem -extfile extfile.cnf
Signature ok
subject=/CN=your.host.com
Getting CA Private Key
Enter pass phrase for ca-key.pem:
- Start Docker daemon with TLS enabled and use arguments for the CA, server certificate, and CA-signed key. This time, Docker daemon using TLS will run on port 2376 (which is standard for the daemon TLS):
$ chmod -v 0400 ca-key.pem key.pem server-key.pem
$ chmod -v 0444 ca.pem server-cert.pem cert.pem
$ dockerd --tlsverify --tlscacert=ca.pem --tlscert=server-cert.pem --tlskey=server-key.pem \
-H=0.0.0.0:2376
- Using the same CA, create a client CA-signed key, specifying that this key will be used for client authentication:
$ openssl genrsa -out key.pem 4096
Generating RSA private key, 4096 bit long modulus
.........................................................++
................++
e is 65537 (0x10001)
$ openssl req -subj '/CN=client' -new -key key.pem -out client.csr
$ echo extendedKeyUsage = clientAuth > extfile-client.cnf
$ openssl x509 -req -days 365 -sha256 -in client.csr -CA ca.pem -CAkey ca-key.pem \
-CAcreateserial -out cert.pem -extfile extfile-client.cnf
Signature ok
subject=/CN=client
Getting CA Private Key
Enter pass phrase for ca-key.pem:
- We will move generated client certificates to the client's host (the client's laptop, for example). We will also copy the public CA certificate file. With its own client certificates and the CA, we will be able to connect to a remote Docker daemon using TLS to secure the communications. We will use the Docker command line with --tlsverify and other arguments to specify the server's same CA, the client certificate, and its signed key (the daemon's default port for TLS communications is 2376). Let's review an example using docker version:
$ docker --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem -H=$HOST:2376 version
All these steps should be done to provide TLS communications, and steps 4 and 5 should be undertaken for all client connections if we want to identify their connections (if you don't want to use a unique client certificate/key pair, for example). On enterprise environments, with hundreds or even thousands of users, this is ungovernable and Docker Enterprise will provide a better solution with all these steps included automatically, thereby providing granulated accesses.
Since Docker version 18.09, we can interact with Docker daemon using the $ docker -H ssh://me@example.com:22 ps command, for example. To use the SSH connection, you need to set up an ssh public key authentication.
Docker daemon security
Docker container runtime security is based on the following:
- Security provided by the kernel to containers
- The attack surface of the runtime itself
- Operating system security applied to the runtime
Let's take a look at these in more detail.
Namespaces
We have been talking about kernel namespaces and how they implement the required isolation for containers. Every container runs with the following namespaces:
- pid: Process isolation (Process ID – PID)
- net: Manages network interfaces (Networking – NET)
- ipc: Manages access to IPC resources (InterProcess Communication – IPC)
- mnt: Manages filesystem mount points (Mount – MNT)
- uts: Isolates kernel and version identifiers (Unix Timesharing System – UTS)
As each container runs with its own pid namespace, it will only have access to the listed process on this namespace. The net namespace will provide its own interfaces, which will allow us to start many processes using the same port on different containers. Container visibility is enabled by default. All containers will have access to external networks using host bridge interfaces.
A complete root filesystem will be inside each container, and it will use this as a standard Unix filesystem (with its own /tmp, and network files such as /etc/hosts and /etc/resolv.conf). This dedicated filesystem is based on copy-on-write, using different layers from images.
Namespaces provide layers of isolation for the container, and control groups will manage how many resources will be available for the container. This will ensure that the host will not get exhausted. In multi-tenant environments, or just for production, it is very important to manage the resources of containers and to not allow non-limited containers.
The attack surface of the daemon is based on user access. By default, Docker daemon does not provide any role-based access solution, but we have seen that we can ensure an encrypted communication for external clients.
As Docker daemon runs as root (the experimental mode will allow us to run rootless), all containers will be able to, for example, mount any directory on your host. This can be a real problem and that is why it's so important to ensure that only required users have access to the Docker socket (local or remote).
As we will see in Chapter 3, Running Docker Containers, containers will run as root if we don't specify a user on image building or container startup. We will review this topic later and improve this default user usage.
It is recommended to run just Docker daemon on server-dedicated hosts because Docker can be dangerous in the wrong hands when it comes to other services running on the same host.
User namespace
As we've already seen, Linux namespaces provide isolation for processes. These processes just see what cgroups and these namespaces offer, and for these processes, they are running along on their own.
We always recommend running processes inside containers as non-root users (nginx, for example, does not require root to be running if we use upper ports), but there are some cases where they must be run as root. To prevent privilege escalation from within these root containers, we can apply user remapping. This mechanism will map a root user (UID 0) inside the container, with the user's non-root (UID 30000).
User remapping is managed by two files:
- /etc/subid: This sets the user ID range for subordinates.
- /etc/subgid: This sets the group ID range for subordinates.
With these files, we set the first sequence ID for users and groups, respectively. This is an example format for the subordinate ID, nonroot:30000:65536. This means that UID 0 inside the container will be mapped as UID 30000 on the Docker host and so forth.
We will configure Docker daemon to use this user remapping with the --userns-remap flag or the userns-remap key in JSON format. In special cases, we can change the user namespace behavior when running the container.
Kernel capabilities (seccomp)
By default, Docker starts containers with a restricted set of capabilities. This means that containers will run unprivileged by default. So, running processes inside containers improves application security by default.
These are the 14 capabilities available by default to any container running in your system: SETPCAP, MKNOD, AUDIT_WRITE, CHOWN, NET_RAW, DAC_OVERRIDE, FOWNER, FSETID, KILL, SETGID, SETUID, NET_BIND_SERVICE, SYS_CHROOT, and SETFCAP.
The most important thing to understand at this point is that we can run processes inside a container listening on ports under 1024 because we have NET_BIND_SERVICE capability, for example, or that we can use ICMP inside containers because we have NET_RAW capability enabled.
On the other hand, there are many capabilities not enabled by default. For example, there are many system operations that will need SYS_ADMIN capability, or we will need NET_ADMIN capability to create new interfaces (running openvpn inside Docker containers will require it).
Processes will not have real root privileges inside containers. Using seccomp capabilities, it is possible to do the following:
- Deny mount operations
- Deny access to raw sockets (to prevent packet spoofing)
- Deny access to some filesystem operations, such as file ownership
- Deny module loading, and many others
The permitted capabilities are defined using a default seccomp profile. Docker uses seccomp in filter mode, disabling all non-whitelisted calls defined on its own JSON format in profile files. There is a default profile that will be used when running containers. We can use our own seccomp profile using the --security-opt flag on launch. So, manipulating allowed capabilities is easy during container execution. We will learn more about how to manipulate the behavior of any container at the start of Chapter 3, Running Docker Containers:
$ docker container run --cap-add=NET_ADMIN--rm -it --security-opt seccomp=custom-profile.json alpine sh
This line will run our container, adding the NET_ADMIN capability. Using a custom seccomp profile, we will be adding even more, as defined on custom-profile.json. For security reasons, we can even use --cap-drop to drop some of the default capabilities if we are sure that we don't need them.
Avoid using the --privileged flag as your container will run unconfined, which means that it will run nearly with the same access to the host as processes running outside containers on the host. In this case, resources will be unlimited for this container (the SYS_RESOURCE capability will be enabled and limit flags will not be used). Best practice for users would be to remove all capabilities except those required by the process to work.
Linux security modules
Linux operating systems provide tools to ensure security. In some cases, they come installed and configured by default in out-of-the-box installations, while in other cases, they will require additional administrator interaction.
AppArmor and SELinux are probably the most common. Both provide finer-grained control over file operations and other security features. For example, we can ensure that only the allowed process can modify some special files or directories (for example, /etc/passwd).
Docker provides templates and policies that are installed with the product that ensures complete integration with these tools to harden Docker hosts. Never disable SELinux or AppArmor on production and use policies to add features or accesses for your processes.
We can review which security modules are enabled in our Docker runtime by looking at the SecurityOptions section of the Docker system info output.
We can easily review Docker runtime features using docker system info. It is good to know that the output can be displayed in JSON format using docker system info --format '{{json .}}' and that we can filter by using the --filter option. Filtering allows us, for example, to retrieve only security options applied to the docker system info --format '{{json .SecurityOptions}}' daemon.
By default, Red Hat flavor hosts will not have SELinux enabled, but, on the other hand, Ubuntu will run by default with AppArmor.
There is a very common issue when we move the default Docker data root path to another location in Red Hat Linux. If SELinux is enabled (by default on these systems), you will need to add a new path to the allowed context by using # semanage fcontext -a -e /var/lib/docker _MY_NEW_DATA-ROOT_PATH and then # restorecon -R -v _MY_NEW_DATA-ROOT_PATH.
Docker Content Trust
Docker Content Trust is the mechanism provided by Docker to improve content security. It will provide image ownership and verification of immutability. This option, which is applied at Docker runtime, will help to harden content execution. We can ensure that only certain images can run on Docker hosts. This will provide two different levels of security:
- Only allow signed images
- Only allow signed images by certain users or groups/teams (we will learn about the concepts that are integrated with Docker UCP in Chapter 11, Universal Control Plane)
We will learn about volumes, which are the objects used for container persistent storage, in Chapter 4, Container Persistency and Networking.
Enabling and disabling Docker Content Trust can be managed by setting the DOCKER_CONTENT_TRUST=1 environment variable in a client session, in the systemd Docker unit. Alternatively, we can use --disable-content-trust=false (true by default) on image and container operations.
With any of these flags enabling content trust, all Docker operations will be trusted, which means that we won't be able to download and execute any non-trusted flags (signed images).