As we mentioned, the ES-Hadoop feature contains two major areas: distributed computing and distributed storage. The main goal of ES-Hadoop is to seamlessly connect Elasticsearch and Hadoop so that they can benefit each other with distributed computing, distributed storage, searching, analytics, visualization, and more. We can import Hadoop Distributed File System (HDFS) data to Elasticsearch for search and analysis, and export the Elastisearch data to HDFS for snapshot and restore. ES-Hadoop fully supports the Spark framework, including Spark, Hive, Pig, Storm, Cascading, and sure, the standard MapReduce. Let's take a look at the data flow between Elasticsearch, ES-Hadoop, and components in the Hadoop ecosystem, as shown in the following screenshot:
In short, we can think of ES-Hadoop as a data bridge between Elasticsearch and the Hadoop big data ecosystem...