Today, data is being used to generate new opportunities, and companies can generate more business using data analytics. A single machine is not enough to handle growing volumes of data. Hence, it is necessary to distribute data between multiple machines. The Distributed File System consists of a number of machines, with data distributed between them. Hadoop offers a distributed storage File System, which is known as the Hadoop Distributed File System (HDFS). HDFS is capable of handling high volumes of data and is easily scalable. It also handles machine failure without any data loss.Â
In this chapter, we will look at the following topics:
- The details of the HDFS architecture
- Read/write operations
- The internals of HDFS components
- HDFS commands and understanding their internal working