Installing a single-node cluster - YARN components
In the previous recipe, we discussed how to set up Namenode and Datanode for HDFS. In this recipe, we will be covering how to set up YARN on the same node.
After completing this recipe, there will be four daemons running on the nn1.cluster1.com
node, namely namenode
, datanode
, resourcemanager
, and nodemanager
daemons.
Getting ready
For this recipe, you will again use the same node on which we have already configured the HDFS layer.
All operations will be done by the hadoop
user.
How to do it...
- Log in to the node
nn1.cluster1.com
and change to thehadoop
user. - Change to the
/opt/cluster/hadoop/etc/hadoop
directory and configure the filesmapred-site.xml
andyarn-site.xml
: - The file
yarn-site.xml
specifies the shuffle class, scheduler, and resource management components of the ResourceManager. You only need to specifyyarn.resourcemanager.address
; the rest are automatically picked up by the ResourceManager. You can see from the following screenshot that you can separate them into their independent components: - Once the configurations are in place, the
resourcemanager
andnodemanager
daemons need to be started: - The environment variables that were defined by
/etc/profile.d/hadoopenv.sh
includedYARN_HOME
andYARN_CONF_DIR
, which let the framework know about the location of the YARN configurations.
How it works...
The nn1.cluster1.com
node is configured to run HDFS and YARN components. Any file that is copied to the HDFS will be split into blocks and stored on Datanode. The metadata of the file will be on the Namenode.
Any operation performed on a text file, such as word count, can be done by running a simple MapReduce program, which will be submitted to the single node cluster using the ResourceManager daemon and executed by the NodeManager. There are a lot of steps and details as to what goes on under the hood, which will be covered in the coming chapters.
Note
The single-node cluster is also called pseudo-distributed cluster.
There's more...
A quick check can be done on the functionality of HDFS. You can create a simple text file and upload it to HDFS to see whether it is successful or not:
$ hadoop fs –put test.txt /
This will copy the file test.txt
to the HDFS. The file can be read directly from HDFS:
$ hadoop fs –ls / $ hadoop fs –cat /test.txt
See also
- The Installing multi-node cluster recipe