Setting up Hadoop to spread disk I/O
Modern servers usually have multiple disk devices to provide large storage capacities. These disks are usually configured as RAID arrays, as their factory settings. This is good for many cases but not for Hadoop.
The Hadoop slave node stores HDFS data blocks and MapReduce temporary files on its local disks. These local disk operations benefit from using multiple independent disks to spread disk I/O.
In this recipe, we will describe how to set up Hadoop to use multiple disks to spread its disk I/O.
Getting ready
We assume you have multiple disks for each DataNode node. These disks are in a JBOD (Just a Bunch Of Disks) or RAID0 configuration. Assume that the disks are mounted at /mnt/d0, /mnt/d1
, ..., /mnt/dn
, and the user who starts HDFS has write permission on each mount point.
How to do it...
In order to set up Hadoop to spread disk I/O, follow these instructions:
1. On each DataNode node, create directories on each disk for HDFS to store its data blocks...