The Hadoop forensic evidence ecosystem
Forensics is based on evidence. For digital investigations, evidence is data. For Hadoop, the evidence is the information stored on disk and in memory. Not all information stored in Hadoop is relevant; it depends on the nature of the investigation. Evidence that is relevant in one investigation may not be relevant in another. This section summarizes the various sources of evidence and the overall ecosystem of Hadoop forensic evidence.
Standard Hadoop processes or system-generated diagnostic information may not be relevant to a forensic investigation. For example, a Hadoop cluster installed without any customizations that only stores and analyses web log data may not require a collection of all files and process data. Instead, a targeted collection of the web log data can be performed without losing evidence. In other investigations, collecting the log and configuration files may be necessary.
Forensic data in Hadoop falls into three categories:
- Supporting...