Adding nodes to the cluster
Over a period of time, our cluster will grow in data and there will be a need to increase the capacity of the cluster by adding more nodes.
We can add Datanodes to the cluster in the same way that we first configured the Datanode started the Datanode daemon on it. But the important thing to keep in mind is that all nodes can be part of the cluster. It should not be that anyone can just start a Datanode daemon on his laptop and join the cluster, as it will be disastrous. By default, there is nothing preventing any node being a Datanode, as the user has just to untar the Hadoop package and point the file "core-site.xml" to the Namenode and start the Datanode daemon.
Getting ready
For the following steps, we assume that the cluster that is up and running with Datanodes is in a healthy state and we need to add a new Datanode in the cluster. We will login to the Namenode and make changes there.
How to do it...
- ssh to Namenode and edit the file
hdfs-site.xml
to add the following property to it:<property> <name>dfs.hosts</name> <value>/home/hadoop/includes</value> <final>true</final> </property>
- Make sure the file
includes
is readable by the userhadoop
. - Restart the Namenode daemon for the property to take effect:
$ hadoop-daemons.sh stop namenode $ hadoop-daemons.sh start namenode
- A restart of Namenode is required only when any property is changed in the file. Once the property is in place, Namenode can read the changes to the contents of the
includes
file by simply refreshing the nodes. - Add the
dn1.cluster1.com
node to the fileexcludes
:$ cat includes dn1.cluster1.com
- The file
includes
orexcludes
can contain a list of multiple nodes, one node per line. - After adding the node to the file, we just need to reload the file by entering the following:
$ hadoop dfsadmin -refreshNodes
- After some time, the node will be available in the cluster and can be seen:
$ hdfs dfsadmin -report
How it works...
The file /home/hadoop/includes
will contain a list of all the Datanodes that are allowed to join a cluster. If the file includes
is blank, then all Datanodes are allowed to join the cluster. If there is both an include
and exclude
file, the list of nodes must be mutually exclusive in both the files. So, to decommission the node dn1.cluster.com
from the cluster, it must be removed from the includes
file and added to the excludes
file.
There's more...
In addition to controlling the nodes as we described, there will be firewall rules in place and separate VLANs for Hadoop clusters to keep the traffic and data isolated.