Implementing a filesystem in Hadoop
Based on the situation, it might be a necessity to replace HDFS with a filesystem of your choice. Hadoop provides out-of-the-box support for a few filesystems such as S3. HDFS replacement can be done either as a drop-in replacement or, as in the case with S3, seamless integration with the S3 file store for input and output.
In this section, we will re-implement the S3 native filesystem and extend Hadoop. The code in this section illustrates the steps on how HDFS replacement can be done. Error handling and other features related to S3 have been omitted for brevity.
The major steps in implementing a filesystem for Hadoop are as follows:
- The
org.apache.hadoop.fs.FileSystem
abstract class needs to be extended and all the abstract methods need to be overridden. There are out-of-the-box implementations forFilterFileSystem
,NativeS3FileSystem
,S3FileSystem
,RawLocalFileSystem
,FTPFileSystem
, andViewFileSystem
. - The
open
method returns anFsDataInputStream
object...