CoreOS high-level architecture
The CoreOS node in a cluster comprises the following main components:
- etcd
- systemd
- fleet
- Docker/Rocket containers
The CoreOS node runs etcd, systemd, and the fleet service in all of the nodes in the cluster. etcd, which is running in all the nodes, talk to each other and elects one node as the master node. All the services running inside the node will be advertised to this master node, which makes etcd provide a service discovery mechanism. Similarly, fleetd running in different nodes maintains the list of services running in different nodes in its service pool, which provides service-level orchestration. fleetctl
and etcdctl
are command-line utilities to configure the fleet and etcd utilities respectively.
Refer to subsequent sections of this chapter to understand the functionality of each component in detail.
These components together provide three main functionalities for CoreOS as follows:
- Service discovery
- Cluster management
- Container management
Service discovery
In the CoreOS environment, all user applications are deployed as services inside a container that can either be a Docker container or a Rocket container. As different applications/services are running as separate containers in the CoreOS cluster, it is inevitable to announce the services provided by each node to all the nodes in the cluster. Along with service availability, it is also required that each service advertises the configuration parameters to other services. This service advertisement is very important when the services are tightly coupled and dependent on each other. For example, the web service should know details about the database services, about the connection string, or type of database and so on. CoreOS provides a way for each service to advertise its service and configuration information using the etcd service. The data announced to the etcd service will be given/announced to all the nodes in the cluster by the master node.
etcd
etcd is a distributed key value store that stores data across the CoreOS cluster. The etcd service is used for publishing services running on a node to all the other nodes in the cluster, so that all the services inside the cluster discover other services and configuration details of other services. etcd is responsible for electing the master node among the set of nodes in the cluster. All nodes in the cluster publish their services and configuration information to the etcd service of the master node, which provides this information to all the other nodes in the cluster.
Container management
The key element of the CoreOS building block is a container that can either be Docker or Rocket. The initial version of CoreOS officially supports Docker as the means for running any service application in the CoreOS cluster. In the recent version, CoreOS supports a new container mechanism called Rocket, even though CoreOS maintains backward compatibility with Docker support. All customer applications/services will be deployed as a container in the CoreOS cluster. When multiple services are running inside a server for different customers, it is inevitable to isolate the execution environment from one customer to another customer. Typically, in a VM-based environment, each customer will be given a VM and inside this VM the customer can run their own service, which provides complete isolation of the execution environment between customers. The container also provides a lightweight virtualization environment without running a separate copy of the VM.
Linux Container
Linux Container (LXC) is a lightweight virtualization environment provided by the Linux kernel to provide system-level virtualization without running a hypervisor. LXC provides multiple virtualized environments, each of them being inaccessible and invisible from the other. Thus, an application that is running inside one Linux container will not have access to the other containers.
LXC combines three main concepts for resource isolation as follows:
- Cgroups
- Namespaces
- Chroot
The following diagram explains in detail about LXC and the utilities required to provide LXC support:
Libvirt is a 'C' library toolkit that is used to interact with the virtualization capabilities provided by the Linux kernel. It acts as a wrapper layer for accessing the APIs exposed by the virtualization layer of the kernel.
cgroups
Linux cgroups is a feature provided by the kernel to restrict access to system resource for a process or set of processes. The Linux cgroup provides a way to reserve or allocate resources, such as CPU, system memory, network bandwidth and so on, to a group of processes/tasks. The administrator can create a cgroup and set the access levels for these resources and bind one or more processes to these groups. This provides fine-grained control over the resources in the system to different processes. This is explained in detail in the next diagram. The resources mentioned on the left-hand side are grouped into two different cgroups called cgroups-1 and cgroups-2. task1 and task2 are assigned to cgroups-1, which makes only the resources allocated for cgroups-1 available for task1 and task2.
Managing cgroups consists of the following steps:
- Creation of cgroups.
- Assign resource limit to the cgroup based on the problem statement. For example, if the administrator wants to restrict an application not to consume more that 50 percent of CPU, then he can set the limit accordingly.
- Add the process into the group.
As the creation of cgroups and allocation of resources happens outside of the application context, the application that is part of the cgroup will not be aware of cgroups and the level of resource allocated to that cgroup.
Namespace
namespace is a new feature introduced from Linux kernel version 2.6.23 to provide resource abstraction for a set of processes. A process in a namespace will have visibility only to the resources and processes that are part of that namespace alone. There are six different types of namespace abstraction supported in Linux as follows:
- PID/Process Namespace
- Network Namespace
- Mount Namespace
- IPC Namespace
- User Namespace
- UTS Namespace
Process Namespace provides a way of isolating the process from one execution environment to another execution environment. The processes that are part of one namespace won't have visibility to the processes that are part of other namespaces. Typically, in a Linux OS, all the processes are maintained in a tree with a child-parent relationship. The root of this process tree starts with a specialized process called init process whose process-id
is 1
. The init process is the first process to be created in the system and all the process that are created subsequently will be part of the child nodes of the process tree. The process namespace introduces multiple process trees, one for each namespace, which provides complete isolation of the processes running across different namespaces. This also brings the concept of a single process to have two different pids
: one is the global context and other in the namespace context. This is explained in detail in the following diagram.
In the following diagram, for namespace, all processes have two process IDs: one in the namespace context and the other in the global process tree.
Network Namespace provides isolation of the networking stack provided by the operating system for each container. Isolating the network stack for each namespace provides a way to run multiple same services, say a web server for different customers or a container. In the next diagram, the physical interface that is connected to the hypervisor is the actual physical interface present in the system. Each container will be provided with a virtual interface that is connected to the hypervisor bridging process. This hypervisor bridging process provides inter-container connectivity across the container, which provides a way for an application running in one container to talk to another application running in another container.
Chroot
Chroot is an operation supported by Linux OS to change the root directory of the current running process, which apparently changes the root directory of its child. The application that changes the root directory will not have access to the root directory of other applications. Chroot is also called chroot jail.
Combining the cgroups, namespace, and chroot features of the Linux kernel provides a sophisticated virtualized resource isolation framework with clear segregation of the data and resources across various processes in the system.
In LXC, the chroot utility is used to separate the filesystem, and each filesystem will be assigned to a container that provides each container with its own root filesystem. Each process in a container will be assigned to the same cgroup with each cgroup having its own resources providing resource isolation for a container.
Docker
Docker provides a portable way to deploy a service in any Linux distribution by creating a single object that contains the service. Along with the service, all the dependent services can be bundled together and can be deployed in any Linux-based servers or virtual machine.
Docker is similar to LXC in most aspects. Similar to LXC, Docker is a lightweight server virtualization infrastructure that runs an application process in isolation, with resource isolation, such as CPU, memory, block I/O, network, and so on. But along with isolation, Docker provides "Build, Ship and Run" modeling, wherein any application and its dependencies can be built, shipped, and run as a separate virtualized process running in a namespace isolation provided by the Linux operating system.
Dockers can be integrated with any of the following cloud platforms: Amazon Web Services, Google Cloud Platform, IBM Bluemix, Jelastic, Jenkins, Microsoft Azure, OpenStack Nova, OpenSVC, and configuration tools such as Ansible, CFEngine, Chef, Puppet, Salt, and Vagrant. The following are the main features provided by Docker.
The main objective of Docker is to support micro-service architecture. In micro-service architecture, a monolithic application will be divided into multiple small services or applications (called micro-services), which can be deployed independently on a separate host. Each micro-service should be designed to perform specific business logic. There should be a clear boundary between the micro-services in terms of operations, but each micro-service may need to expose APIs to different micro-services similar to the service discovery mechanism described earlier. The main advantage of micro-service is quick development and deployment, ease of debugging, and parallelism in the development for different components in the system. One of the main advantage of micro-services is based on the complexity, bottleneck, processing capability, and scalability requirement; every micro-service can be individually scaled.
Docker versus LXC
Docker is designed for deploying applications, whereas LXC is designed to deploy a machine. LXC containers are treated as a machine, wherein any applications can be deployed and run inside the container. Docker is designed to run a specific service or application to provide container as an application. However, when an application or service has a dependency with other services, these services can also be packed along with the same Docker image. Typically, the docker container doesn't provide all the services that will be provided by any OS, such as init systems, syslog, cron, and so on. As Docker is more focused on deploying applications, it provides tools to create a docker container and deploy the services using source code.
Docker containers are designed to have a layered architecture with each layer containing changes from the previous version. The layered architecture provides the docker to maintain the version of the complete container. Like any typical version control tools like Git/CVS, docker containers are maintained with a different version with operations like commit, rollback, version tracking, version diff, and so on. Any changes made inside the docker application will be made as a read-only layer until it is committed.
Docker-hub contains more than 14,000 containers available for various well-known services that can be downloaded and deployed very easily.
Docker provides an efficient mechanism for chaining different docker containers, which provides a good service chaining mechanism. Different docker containers can be connected to each other via different mechanisms as follows:
- Docker link
- Using docker0 bridge
- Using the docker container to use the host network stack
Each mechanism has its own benefits. Refer to Chapter 7, Creating a Virtual Tenant Network and Service Chaining Using OVS for more information about service chaining.
Docker uses libcontainer, which accesses the kernel's container calls directly rather than creating an LXC.
Rocket
Historically, the main objective of CoreOS is to run the services as a lightweight container. Docker's principle was aligning with the CoreOS service requirement with simple and composable units as container. Later on, Docker adds more and more features to make the Docker container provide more functionality than standard containers inside a monolithic binary. These functionalities include building overlay networks, tools for launching cloud servers with clustering, building images, running and uploading images, and so on. This makes Docker more like a platform rather than a simple container.
With the previously mentioned scenario, CoreOS started working on a new alternative to Docker with the following objectives:
- Security
- Composability
- Speed
- Image distribution
CoreOS announced the development of Rocket as an alternative to Docker to meet the previously mentioned requirements. Along with the development of Rocket, CoreOS also started working on an App Container Specification. The specification explains the features of the container such as image format, runtime environment, container discovery mechanism, and so on. CoreOS launched its first version of Rocket along with the App Container Specification in December 2014.
CoreOS cluster management:
Clustering is the concept of grouping a set of machines to a single logical system (called cluster) so that the application can be deployed in any one machine in the cluster. In CoreOS, clustering is one of the main features provided by CoreOS by running different services/docker container over the cluster of the machine. Historically, in most of the Linux distribution, services can be managed using the systemd utility. CoreOS extends the systemd service from a single node to a cluster using fleet utility. The main reason for CoreOS to choose fleet to orchestrate the services across the CoreOS cluster is as follows:
- Performance
- Journal support
- Rich syntax in deploying the services
It is also possible to have a CoreOS cluster with a combination of a physical server and virtual machines as long as all the nodes in the cluster are connected to each other and reachable. All the nodes that want to participate in the CoreOS cluster should run CoreOS with the same cluster ID.
systemd
systemd is an init system utility that is used to stop, start, and restart any of the Linux services or user programs. systemd has two main terminologies or concepts: unit and target. unit is a file that contains the configuration of the services to be started, and target is a grouping mechanism to group multiple services to be started at the same time.
fleet
fleet emulates all the nodes in the cluster to be part of a single init system or system service. fleet controls the systemd service at the cluster level, not in the individual node level, which allows fleet to manage services in any of the nodes in the cluster. fleet not only instantiates the service inside a cluster but also manages how the services are to be moved from one node to another when there is a node failure in the cluster. Thus, fleet guarantees that the service is running in any one of the nodes in the cluster. fleet can also take care of restricting the services to be deployed in a particular node or set of nodes in a cluster. For example, if there are ten nodes in a cluster and among the ten nodes a particular service, say a web server, is to be deployed over a set of three servers, then this restriction can be enforced when fleet instantiates a service over the cluster. These restrictions can be imposed by providing some information about how these jobs are to be distributed across the cluster. fleet has two main terminologies or concepts: engine and agents. For more information about systemd and fleet, refer to chapter Creating Your CoreOS Cluster and Managing the Cluster.