Chapter 5. Collecting Hadoop Application Data
Hadoop evidence can be forensically collected from more than just the filesystem. Evidence can also be collected from Hadoop applications. Hadoop data is formatted for use by its applications, and these applications provide means for more easily extracting relevant data. The process of collecting evidence from Hadoop applications instead of from HDFS offers many advantages, but the approach is very different. Some forensic artifacts, such as metadata, cannot be captured from a Hadoop application collection. However, collecting data from an application avoids some of the time-consuming and challenging tasks involved in forensically imaging HDFS or collecting data from each node individually.
Any Hadoop software outside of the Hadoop layer is considered an application. Two of the most common application packages are Hive and HBase. Both packages operate in ways similar to a database, and their data can be collected through the software...