HDFS
HDFS is a popular storage and access method for storing and retrieving data files for IoT solutions. The HDFS format can hold large amounts of data in a reliable and scalable manner. Its design is based on the Google File System (https://ai.google/research/pubs/pub51). HDFS splits individual files into fixed-size blocks that are stored on machines across the cluster. To ensure reliability, it replicates the file blocks and distributes them across the cluster; by default, the replication factor is 3. HDFS has two main architecture components:
- The first, NodeName, stores the metadata for the entire filesystem, such as filenames, their permissions, and the location of each block of each file.
- The second, DataNode (one or more), is where file blocks are stored. It performs Remote Procedure Calls (RPCs) using protobufs.
Note
RPC is a protocol that one program can use to request a service from a program located on another computer on a network without having to know the network's details. A...