You're reading from Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Product type Paperback

Published in May 2017

Publisher Packt

ISBN-13 9781787126732

Length 348 pages

Edition 1st Edition

Tools

Hadoop

Concepts

System Administration

Author (1):

Aman Singh

View More author details

Table of Contents (14) Chapters

Preface

1. Hadoop Architecture and Deployment FREE CHAPTER

2. Maintaining Hadoop Cluster HDFS

3. Maintaining Hadoop Cluster – YARN and MapReduce

4. High Availability

5. Schedulers

6. Backup and Recovery

7. Data Ingestion and Workflow

8. Performance Tuning

9. HBase Administration

10. Cluster Planning

11. Troubleshooting, Diagnostics, and Best Practices

12. Security

Index

Installing a single-node cluster - YARN components

In the previous recipe, we discussed how to set up Namenode and Datanode for HDFS. In this recipe, we will be covering how to set up YARN on the same node.

After completing this recipe, there will be four daemons running on the nn1.cluster1.com node, namely namenode, datanode, resourcemanager, and nodemanager daemons.

Getting ready

For this recipe, you will again use the same node on which we have already configured the HDFS layer.

All operations will be done by the hadoop user.

How to do it...

Log in to the node nn1.cluster1.com and change to the hadoop user.
Change to the /opt/cluster/hadoop/etc/hadoop directory and configure the files mapred-site.xml and yarn-site.xml:
The file yarn-site.xml specifies the shuffle class, scheduler, and resource management components of the ResourceManager. You only need to specify yarn.resourcemanager.address; the rest are automatically picked up by the ResourceManager. You can see from the following screenshot that you can separate them into their independent components:
Once the configurations are in place, the resourcemanager and nodemanager daemons need to be started:
The environment variables that were defined by /etc/profile.d/hadoopenv.sh included YARN_HOME and YARN_CONF_DIR, which let the framework know about the location of the YARN configurations.

How it works...

The nn1.cluster1.com node is configured to run HDFS and YARN components. Any file that is copied to the HDFS will be split into blocks and stored on Datanode. The metadata of the file will be on the Namenode.

Any operation performed on a text file, such as word count, can be done by running a simple MapReduce program, which will be submitted to the single node cluster using the ResourceManager daemon and executed by the NodeManager. There are a lot of steps and details as to what goes on under the hood, which will be covered in the coming chapters.

Note

The single-node cluster is also called pseudo-distributed cluster.

There's more...

A quick check can be done on the functionality of HDFS. You can create a simple text file and upload it to HDFS to see whether it is successful or not:

$ hadoop fs –put test.txt /

This will copy the file test.txt to the HDFS. The file can be read directly from HDFS:

$ hadoop fs –ls /
$ hadoop fs –cat /test.txt

You're reading from Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Table of Contents (14) Chapters

Installing a single-node cluster - YARN components

Getting ready

How to do it...

How it works...

Note

There's more...

See also

Authors (1)

Other recommended products

Personalised recommendations for you