Implementing HDFS Federation
HDFS Federation is a technique of splitting up the filesystem namespace into multiple parts. Each part will be managed by an individual namenode, resulting in multiple namenodes.
In the following diagram, you will see two namenodes, Namenode-1 (NN1) and Namenode-2 (NN2).
Each namenode manages a namespace volume that consists of the namespace metadata and block pool information. The namespace metadata contains the location information of the files present in HDFS. A block pool is a collection of data blocks that belong to a single namespace in a Hadoop cluster.
Both these namenodes have the same set of datanodes in the cluster. The datanodes store blocks for each of the namenodes. However, the two namenodes do not communicate with each other. In the preceding diagram, you see only two namenodes; however, in production environments, you may have more than two namenodes.
With such architecture in place, it is possible to scale the cluster to a large number of nodes...