Scaling
You have data and you have a running Hadoop cluster; now you get more of the former and need more of the latter. We have said repeatedly that Hadoop is an easily scalable system. So let us add some new capacity.
Adding capacity to a local Hadoop cluster
Hopefully, at this point, you should feel pretty underwhelmed at the idea of adding another node to a running cluster. All through Chapter 6, When Things Break, we constantly killed and restarted nodes. Adding a new node is really no different; all you need to do is perform the following steps:
Install Hadoop on the host.
Set the environment variables shown in Chapter 2, Getting Up and Running.
Copy the configuration files into the
conf
directory on the installation.Add the host's DNS name or IP address to the
slaves
file on the node from which you usually run commands such asslaves.sh
or cluster start/stop scripts.
And that's it!
Have a go hero – adding a node and running balancer
Try out the process of adding a new node and afterwards...