The Hadoop Distributed File System (HDFS) is a file system, as the name states, similar to Windows (FAT or NTFS) and Linux (swap, ext3, and so on) that is specifically designed for distributed systems that need to process large datasets with high availability. It is optimized to work with multiple nodes and spread data across them in a clustered computing environment to assure the maximum availability of data. The main difference between HDFS and other distributed file systems is that it is designed to run in a low-cost hardware setup and is highly fault-tolerant. The basic architecture of HDFS is shown in the following diagram:
HDFS has two main components that basically work using the master and slave concept. These two components are called the Name Node, which is the master node, and the DataNode, which is a slave...