Hadoop data analysis tools
Hadoop was designed to store and analyze large volumes of data. The ecosystem of tools for Hadoop analysis is large and complex. Depending on the type of analysis, many different tools can be used. The Apache Foundation set of tools has a number of standard options such as Hive, HBase, and Pig, but other open source and commercial solutions have been developed to meet different analysis requirements using Hadoop's HDFS and MapReduce features. For example, Cloudera's Impala database runs on Hadoop, but it is not part of the Apache Foundation suite of applications.
Understanding which data analysis tools are used in a Hadoop cluster is important for identifying and properly collecting data. Some data analysis tools store data in formatted files and may offer easier methods for data collection. Other tools may read data directly from files stored in HDFS, but the scripts used for the tool may serve as useful information when later analyzing the data. This...