Namenode HA using shared storage
In Hadoop, we do not recommend NAS or SAN as storage for Datanodes, as it defeats the purpose of localized data. However, for critical components such as Namenode, there will be a storage mount point to store Namenode metadata. This is specified as a comma-separated list under the dfs.namenode.name.dir
parameter.
For Namenode High Availability (HA), we need a shared location to store metadata, which can be accessed from both Namenodes. Only primary or active Namenodes can write to the shared location, but both Namenodes can read from it.
The active Namenode is the writer and the standby node is the reader node only. Namenode can failover from one node to another, but only one node can be Active at any given time. Another important thing to keep in mind is that the Datanodes talk to both the Namenodes so that after failure between Namenodes, there is no time taken to update the block report.
The shared storage can be any NFS server or a simple filer that can...