Clustering fundamentals
In computer science, a cluster consists of a group of computers (with each computer referred to as a node or member) that work together so that the set is seen as a single system from the outside.
The enterprise and science environments often require high computing power to analyze massive amounts of data produced every day, and redundancy. In order for the results to be always available to people either using those services or managing them, we rely on the high availability and performance of computer systems. The need of Internet websites, such as those used by banks and other commercial institutions, to perform well when under a significant load is a clear example of the advantages of using clusters.
There are two typical cluster setups. The first one involves assigning a different task to each node, thus achieving a higher performance compared with several tasks being performed by a single member on its own. Another classic use of clustering is to help ensure high availability by providing failover capabilities to the set where one node may automatically replace a failed member to minimize the downtime of one or several critical services. In either case, the concept of clustering implies not only taking advantage of the computing functionality of each member alone, but also maximizing it by complementing it with the others.
This type of cluster setup is called high availability (HA), and it aims to eliminate system downtime by failing over services from one node to another in case one of them experiences an issue that renders it inoperative. As opposed to switchover, which requires human intervention, a failover procedure is performed automatically by the cluster without any downtime. In other words, this operation is transparent to end users and clients from outside the cluster.
The second setup uses its nodes to perform operations in parallel in order to enhance the performance of one or more applications, and is called a high-performance cluster (HPC). HPCs are typically seen in scenarios involving applications and processes that use large collections of data.
Why Linux and CentOS 7?
As mentioned earlier, we will build a cluster with two machines running Linux. This choice is supported by the fact that this involves low costs and stability associated with this setup—no paid operating system or software licenses, along with the possibility of running Linux on systems with small resources (such as a Raspberry Pi or relatively old hardware). Thus, we can set up a cluster with very little resources or money.
We will begin our own journey toward clustering by setting up the separate nodes that will make up our system. Our choice of operating system is Linux and CentOS version 7, as the distribution, which is the latest available release of CentOS as of now. The binary compatibility with Red Hat Enterprise Linux © (which is one of the most well-used distributions in enterprise and scientific environments) along with its well-proven stability are the reasons behind this decision.
Note
CentOS 7 is available for download, free of charge, from the project's web site at http://www.centos.org/. In addition to this, specific details about the release can always be consulted in release notes available through the CentOS wiki, http://wiki.centos.org/Manuals/ReleaseNotes/CentOS7.
Downloading CentOS
To download CentOS, go to http://www.centos.org/download/ and click on one of the three options outlined in the following screenshot:
- DVD ISO: This is an
.iso
file (~4 GB) that can be written into regular DVD optical media and includes the common tools. Download this file if you have permanent access to a reliable Internet connection that you can use to download other packages and utilities later. - Everything ISO: This is an
.iso
file (~7 GB) with the complete set of packages made available in the base repository of CentOS 7. Download this file if you do not have access to a permanent Internet connection or if your plan contemplates the possibility of installing or populating a local or network mirror. - alternative downloads: This link will take you to a public directory within an official nearby CentOS mirror where the previous options are available along with others, including different choices of desktop versions (GNOME or KDE), and the minimal
.iso
file (~570 MB), which contains the core or essential packages of the distribution.
Although all the three download options will work, we will use the minimal install as it is sufficient for our purpose at hand, and we can install other needed packages using public software package repositories with the standard Centos package manager yum later. The recommended .iso
file to download is the latest that is available from the download page, which at the time of writing this is CentOS 7.0 1406 x86_64 Minimal.iso.