The Hadoop shell command collection
Collecting HDFS data from within the Hadoop layer solves many of the problems that affect host operating system collections. First, the collection only has to be performed from a single machine. By accessing Hadoop through a Hadoop client's command line, all HDFS files are available, so the collection does not involve collecting data from each node individually. Second, the collected data does not require any piecing together or file carving in the analysis phase. The data that is collected is already pieced together as the logical Hadoop files, so no carving or data reconstruction is required.
The following is a list of limitations of collecting HDFS data from the Hadoop shell command line:
This method is only possible when Hadoop is online and its command line is accessible
Forensic tools such as dd and md5sum cannot easily be used during the collection of the data
Deleted data and data in memory that has not been written to disk may not be available
Hadoop...