Summary
This chapter provides step-by-step guidelines for setting up and configuring core Hadoop components. We have started with identifying what are the most critical OS settings that need to be adjusted for a Hadoop server. Then, we focused on steps to set up NameNode, DataNode, JobTracker, and TaskTracker using CDH distribution for CentOS Linux distribution. To eliminate a single point of failure for HDFS, we have configured a NameNode High Availability cluster using JournalNodes Quorum. Following these steps you can build a fully functional, production-ready cluster. Hadoop core components are enough to start performing useful work, but over time, lots of ecosystem projects have evolved around Hadoop. Some of them became a must-have cluster component. We will enrich our cluster with additional functionality provided by those projects in the next chapter, Configuring the Hadoop Ecosystem.