Application collection approaches
Hadoop data is stored in a unique structure. Unlike most relational database systems, which loads and stores data in a proprietary format, Hadoop applications typically store data in sets of flat files similar to a hierarchical database. Files are imported into the application, and the application stores those files in a separate file structure and generates the metadata about that data.
Application-based collections have advantages over filesystem-based collections of the application's underlying files. While the file-based storage of files in Hadoop applications enables logical copies of the flat files, these files may not be structured in a format that can be quickly analyzed or the collection may require sampling of files to identify the relevant files. Collecting data from the applications has the following advantages:
- The investigator can collect the data in a format that is quickly and readily analyzable
- The data can be collected more easily by...