An introduction to the distributed file system
A distributed file system is practically the same as any file system due to its basic actions such as storing, reading, deleting files, and assigning security levels are support. The main difference is focused on the number of servers that can be used at same time without dealing with complexity of synchronization. In this case, we can store large files in different server nodes without caring about redundancy or parallel operations.
There are a lot of frameworks for distributed file systems, such as Red Hat Cluster FS, Ceph File system, Hadoop Distributed File System (HDFS), and Tachyon File System.
In this chapter, we will use HDFS, which is an open source implementation of Google File System, built to handle large files into a cluster of commodity hardware. The HDFS cluster implements a NameNode that manages operations through the file system, and a series of DataNodes that manage the storage of the files in the cluster nodes individually...