Previously, we have seen YARN and gained a deeper understanding of its capabilities. This chapter is focused on introducing you to the process-oriented approach to managing, monitoring, and optimizing your Hadoop cluster. We have already covered part of administration, when we set up a single node, a pseudo-distributed node, and a fully fledged distributed Hadoop cluster. We covered sizing the cluster, which is needed as part of the planning activity. We have also gone through some developer and system CLIs in the respective chapters on HDFS, MapReduce, and YARN. Hadoop administration is a vast topic; you will find lot of books dedicated to this activity in the market. I will be touching on key points of monitoring, managing, and optimizing your cluster.
We will cover the following topics:
- Roles and responsibilities of Hadoop...