Forensically collecting a cluster system
Collecting Hadoop data requires acquiring data across multiple cluster nodes. Hadoop's cluster design is structured, so data is distributed across multiple nodes. With the potential for node failure, that data is also redundantly stored across nodes. For a forensic investigator, this means data collection involves collecting data from most or all of the nodes.
In traditional forensic investigations, a single machine or server array is acquired. An investigator can pull the hard drive and perform a physical acquisition of the hard drive. The investigator may not be permitted to turn off the server and pull the server's hard drives. However, the investigator can access the server and collect the server data and any data on attached storage devices.
For Hadoop, or any cluster system, this is rarely the case. A Hadoop cluster may have a series of connected nodes, or its nodes could be geographically distributed. Regardless, multiple nodes are connected...