Hadoop is a programming framework that supports the processing and storage of large data sets in a distributed computing environment. The Hadoop core includes the analytics MapReduce engine and the distributed file system known as Hadoop Distributed File System (HDFS), which has several weaknesses that are listed as follows:
-
It had a single point of failure until the recent versions of HDFS
-
It isn't POSIX compliant
-
It stores at least three copies of data
-
It has a centralized name server resulting in scalability challenges
The Apache Hadoop project and other software vendors are working independently to fix these gaps in HDFS.
The Ceph community has done some development in this space, and it has a filesystem plugin for Hadoop that possibly overcomes the limitations of HDFS and can be used as a drop-in replacement for...