Jobtracker high availability
In MRv1, if the jobtracker fails, all running jobs and tasks are lost. Also, the jobtracker service along with its jobs need to be manually restarted. To avoid these issues, the jobtracker needs to be configured for high availability. CDH5 comes inbuilt with the jobtracker HA package.
Configuring jobtracker high availability
Use the following steps from user hduser
to configure and HA jobtracker for your cluster:
- Stop all the tasktrackers by executing the following command on all the nodes that host tasktrackers:
$ sudo service hadoop-0.20-mapreduce-tasktracker stop
- Stop the jobtracker by executing the following command on the node that hosts the jobtracker:
$ sudo service hadoop-0.20-mapreduce-jobtracker stop
- Remove the installed jobtracker using the following command from
node1.hcluster
:$ sudo yum remove hadoop-0.20-mapreduce-jobtracker
- Install the following HA jobtracker package on two independent nodes, which in our case would be
node1.hcluster
andnode2.hcluster...