Using the Solr 1301 patch – reduce-side indexing
The Solr 1301 patch is again responsible for generating an index using the Apache Hadoop MapReduce framework. This patch is merged in Solr Version 4.7 and is available in the code line if you take Apache Solr with 4.7 and higher versions. This patch is similar to the previous patch (SOLR-1045), but the difference is that the indexes that are generated using Solr 1301 are in the reduce phase and not in the map phase of Apache Hadoop's MapReduce. Once the indexes are generated, they can be loaded on Solr and SolrCloud for further processing and application searching. The following diagram depicts the overall flow:
In case of Solr 1301, a map task is responsible for converting input records to the pair of <key, value>
; later, they are passed to the reducer. The reducer is responsible for converting and publishing SolrInputDocument
, which is then transformed into Solr indexes. The indexes are then persisted on HDFS directly, which can later...