The read/write operational flow in HDFS
To get a better understanding of HDFS, we need to understand the flow of operations for the following two scenarios:
A file is written to HDFS
A file is read from HDFS
HDFS uses a single-write, multiple-read model, where the files are written once and read several times. The data cannot be altered once written. However, data can be appended to the file by reopening it. All files in the HDFS are saved as data blocks.
Writing files in HDFS
The following sequence of steps occur when a client tries to write a file to HDFS:
The client informs the namenode daemon that it wants to write a file. The namenode daemon checks to see whether the file already exists.
If it exists, an appropriate message is sent back to the client. If it does not exist, the namenode daemon makes a metadata entry for the new file.
The file to be written is split into data packets at the client end and a data queue is built. The packets in the queue are then streamed to the datanodes in the...