Chapter 4. Exploring HDFS Federation and Its High Availability
You are now ready to set up a Hadoop cluster using CDH5. Once you have a cluster up and running, you are now responsible for managing it and making sure the cluster is available all the time. In this chapter, we will cover some techniques to manage HDFS efficiently and also handle the single point of failure in a Hadoop cluster. In this chapter, we will cover the following topics:
- Configuring HDFS Federation
- HDFS high availability using Quorum-based storage and storage using Network File System (NFS)
- Jobtracker high availability
The heart of HDFS is the namenode. The namenode manages the locations of all data blocks in the cluster. To serve requests faster, the namenode manages all its information in memory. For small clusters, the information stored is lightweight and in most cases, a decent amount of RAM is enough to handle all the information required to maintain a cluster. However, when the number of datanodes increases...