Hadoop is often used for historical data analytics, although a new trend is emerging where it is used for real-time data streaming as well. Considering the offerings of Hadoop's ecosystem, we have broadly categorized them into the following categories:
- Data flow: This includes components that can transfer data to and from different subsystems to and from Hadoop including real-time, batch, micro-batching, and event-driven data processing.
- Data engine and frameworks: This provides programming capabilities on top of Hadoop YARN or MapReduce.
- Data storage: This category covers all types of data storage on top of HDFS.
- Machine learning and analytics: This category covers big data analytics and machine learning on top of Apache Hadoop.
- Search engine: This category covers search engines in both structured and unstructured Hadoop data.
- Management...