HDFS high availability
NameNodes are the heart of an HDFS Namespace. The availability of any cluster using HDFS is directly related to the availability of the NameNode.
Secondary NameNode, Checkpoint Node, and Backup Node
In Hadoop 1.X, the concept of a Secondary NameNode was introduced. The Secondary NameNode is a shield against disasters. On the failure of a NameNode, the Secondary NameNode can be used to recover the NameNode. The term Secondary NameNode is a misnomer. It is a cold standby and cannot service requests on its own. The NameNode can, however, read from the Secondary NameNode when encountered with failures.
The NameNode writes all HDFS updates to the edits
log in the native filesystem. The log is written in an append-only fashion. The NameNode owns another file called the fsimage
file that contains the image of HDFS. A NameNode starting up, reads the edits
file and applies all the edits one by one to the fsimage
file. During this time, no writes are allowed on HDFS. The NameNode...