Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hadoop 2.x Administration Cookbook

You're reading from   Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Arrow left icon
Product type Paperback
Published in May 2017
Publisher Packt
ISBN-13 9781787126732
Length 348 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Aman Singh Aman Singh
Author Profile Icon Aman Singh
Aman Singh
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Hadoop Architecture and Deployment FREE CHAPTER 2. Maintaining Hadoop Cluster HDFS 3. Maintaining Hadoop Cluster – YARN and MapReduce 4. High Availability 5. Schedulers 6. Backup and Recovery 7. Data Ingestion and Workflow 8. Performance Tuning 9. HBase Administration 10. Cluster Planning 11. Troubleshooting, Diagnostics, and Best Practices 12. Security Index

Installing a single-node cluster - YARN components

In the previous recipe, we discussed how to set up Namenode and Datanode for HDFS. In this recipe, we will be covering how to set up YARN on the same node.

After completing this recipe, there will be four daemons running on the nn1.cluster1.com node, namely namenode, datanode, resourcemanager, and nodemanager daemons.

Getting ready

For this recipe, you will again use the same node on which we have already configured the HDFS layer.

All operations will be done by the hadoop user.

How to do it...

  1. Log in to the node nn1.cluster1.com and change to the hadoop user.
  2. Change to the /opt/cluster/hadoop/etc/hadoop directory and configure the files mapred-site.xml and yarn-site.xml:
    How to do it...
  3. The file yarn-site.xml specifies the shuffle class, scheduler, and resource management components of the ResourceManager. You only need to specify yarn.resourcemanager.address; the rest are automatically picked up by the ResourceManager. You can see from the following screenshot that you can separate them into their independent components:
    How to do it...
  4. Once the configurations are in place, the resourcemanager and nodemanager daemons need to be started:
    How to do it...
  5. The environment variables that were defined by /etc/profile.d/hadoopenv.sh included YARN_HOME and YARN_CONF_DIR, which let the framework know about the location of the YARN configurations.

How it works...

The nn1.cluster1.com node is configured to run HDFS and YARN components. Any file that is copied to the HDFS will be split into blocks and stored on Datanode. The metadata of the file will be on the Namenode.

Any operation performed on a text file, such as word count, can be done by running a simple MapReduce program, which will be submitted to the single node cluster using the ResourceManager daemon and executed by the NodeManager. There are a lot of steps and details as to what goes on under the hood, which will be covered in the coming chapters.

Note

The single-node cluster is also called pseudo-distributed cluster.

There's more...

A quick check can be done on the functionality of HDFS. You can create a simple text file and upload it to HDFS to see whether it is successful or not:

$ hadoop fs –put test.txt /

This will copy the file test.txt to the HDFS. The file can be read directly from HDFS:

$ hadoop fs –ls /
$ hadoop fs –cat /test.txt

See also

  • The Installing multi-node cluster recipe
You have been reading a chapter from
Hadoop 2.x Administration Cookbook
Published in: May 2017
Publisher: Packt
ISBN-13: 9781787126732
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime