HDFS block placement
Replication is an important feature in HDFS; it ensures data reliability against loss and high availability in the face of failures. The default replication factor is 3, though this parameter can be tuned using the dfs.replication
configuration parameter. HDFS does not replicate the file as a whole; rather, it chunks the file into fixed size blocks and stores it across the cluster.
The replication factor can be specified at file creation time. It can be changed as and when desired. The salient feature of HDFS is the smart placement of blocks, and this feature distinguishes it from other distributed filesystems. The placement policy is said to be rack-aware, that is, it is cognizant of the physical location of where the block resides. This not only aids in fault tolerance but can also be instrumental in making network bandwidth utilization efficient. Any computing paradigm running on HDFS can make use of this information to minimize the amount of data that needs to be...