Using Solr 1301 Patch – reduce-side indexing
The Solr 1301 patch is responsible for generating an index using the Apache Hadoop MapReduce framework. This patch is merged in Solr version 4.7 and is available in the code-line if you take Apache Solr with 4.7+ versions. This patch is similar to the previously discussed patch (SOLR-1045), but the difference is that the indexes that are generated using Solr 1301 are in the reduce phase and not in the map phase of Apache Hadoop's MapReduce. Once the indexes are generated, they can be loaded on Solr or SolrCloud for further processing and application searching. The following diagram depicts the overall flow:
In case of Solr 1301, a map task is responsible for converting input records into a <key, value> pair. Later, they are passed to the reducer. The reducer is responsible for converting and publishing SolrInputDocument, which is then transformed into Solr indexes. The indexes are then persisted on HDFS directly and can later be exported...