Operations in the Hadoop 2 world
As mentioned in Chapter 2, Storage, some of the most significant changes made to HDFS in Hadoop 2 involve its fault tolerance and better integration with external systems. This is not just a curiosity, but the NameNode High Availability features, in particular, have made a massive difference in the management of clusters since Hadoop 1. In the bad old days of 2012 or so, a significant part of the operational preparedness of a Hadoop cluster was built around mitigations for, and restoration processes around failure of the NameNode. If the NameNode died in Hadoop 1, and you didn't have a backup of the HDFS fsimage
metadata file, then you basically lost access to all your data. If the metadata was permanently lost, then so was the data.
Hadoop 2 has added the in-built NameNode HA and the machinery to make it work. In addition, there are components such as the NFS gateway into HDFS, which make it a much more flexible system. But this additional capability does...