Setting up NameNode
In this section, we will do a step-by-step installation and basic configuration of the NameNode service including the High Availability (HA) setup. Unlike many other guides and tutorials available online, which treat NameNode's HA setup as an advanced topic, we will focus on setting up NameNode's HA from the beginning. The reason for this is the critical role NameNode plays in the Hadoop setup. Basically, NameNode is a single point of failure for Hadoop cluster. Without this service, there is no way to access files on Hadoop Distributed File System (HDFS).
There are several approaches to setting up NameNode High Availability. Prior to CDH 4.1, HA could be implemented using a shared storage setup. In this case, the primary NameNode writes the filesystem metadata changes into an editlog, which is located on a shared network storage, and a secondary NameNode polls changes from the editlog and applies it to its own copy of metadata snapshot. Additionally, all DataNodes updated...