Common problems and their solutions
The following is a list of common problems and their solutions:
- When I try to format the HDFS node, I get the exception java.io.IOException: Incompatible clusterIDs in namenode and datanode?
This issue usually appears if you have a different/older cluster and you are trying to format a new namenode; however, the datanodes still point to older cluster ids. This can be handled by one of the following:
- By deleting the DFS data folder, you can find the location from
hdfs-site.xml
and restart the cluster - By modifying the version file of HDFS usually located at
<HDFS-STORAGE-PATH>/hdfs/datanode/current/
- By formatting namenode with the problematic datanode's cluster ID:
$ hdfs namenode -format -clusterId <cluster-id>
- By deleting the DFS data folder, you can find the location from
- My Hadoop instance is not starting up with the ./start-all.sh script? When I try to access the web application, it shows the page not found error?
This could be happening because of a number of issues. To understand the issue, you must look at the Hadoop logs first. Typically, Hadoop logs can be accessed from the
/var/log
folder if the precompiled binaries are installed as the root user. Otherwise, they are available inside the Hadoop installation folder. - I have setup N node clusters, and I am running the Hadoop cluster with ./start-all.sh. I am not seeing many nodes in the YARN/NameNode web application?
This again can be happening due to multiple reasons. You need to verify the following:
- Can you reach (connect to) each of the cluster nodes from namenode by using the IP address/machine name? If not, you need to have an entry in the
/etc/hosts
file. - Is the ssh login working without password? If not, you need to put the authorization keys in place to ensure logins without password.
- Is datanode/nodemanager running on each of the nodes, and can you connect to namenode/AM? You can validate this by running ssh on the node running namenode/AM.
- If all these are working fine, you need to check the logs and see if there are any exceptions as explained in the previous question.
- Based on the log errors/exceptions, specific action has to be taken.
- Can you reach (connect to) each of the cluster nodes from namenode by using the IP address/machine name? If not, you need to have an entry in the