Working with HDFS
To get the best performance from HBase, it is essential to get optimal performance from Hadoop/HDFS.
There are various parameters we can look at, but we will limit ourselves to make sure we get the benefits.
Multiple disk mount point:
Df.datanode.data.dir -> use all attached disks to data node. Block Side(DFS) =128 MB Local file system buffer Io.file.buffer.size=131072 (128k) Io.sort.factor=50 to 100 Data Node and NameNode conconcurry Dfs.namenode.handler.count(131072) Dfs..datanode,max.transfer.tthread 4096
Give HDFS as many paths as possible to spread the disk I/O around and to increase the capacity of HDFS.
How to do it…
Please open the dfs.block.size
section from dfs-site.xml
and mapred.min.split.size
from mapred.max.split.size
in mapred-site.xml
.
The split of the input size and the total input data size can be changed and can be mapped to the block size.
This also helps us reduce the number of map tasks. If the map task is reduced, the performance will increase. This...