Time for action – killing a DataNode process
Firstly, we'll kill a DataNode. Recall that the DataNode process runs on each host in the HDFS cluster and is responsible for the management of blocks within the HDFS filesystem. Because Hadoop, by default, uses a replication factor of 3 for blocks, we should expect a single DataNode failure to have no direct impact on availability, rather it will result in some blocks temporarily falling below the replication threshold. Execute the following steps to kill a DataNode process:
Firstly, check on the original status of the cluster and check whether everything is healthy. We'll use the
dfsadmin
command for this:$ Hadoop dfsadmin -report Configured Capacity: 81376493568 (75.79 GB) Present Capacity: 61117323920 (56.92 GB) DFS Remaining: 59576766464 (55.49 GB) DFS Used: 1540557456 (1.43 GB) DFS Used%: 2.52% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available...