Configuring Apache Tika in Solr
Let's go ahead and create a new core called tika-example in our Solr instance. To make things easier, you can copy the core from the Chapter 6
folder of the ZIP file that comes with this book. After creating the core, we'll need to configure solrconfig.xml
.
We need to add the extraction libraries that are available in the %SOLR_HOME/contrib/extraction/lib
folder, and also the solr-cell
library in solrconfig.xml
:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar"/> <lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar"/>
We can then configure ExtractingRequestHandler
in solrconfig.xml
:
<requestHandler name="/update/extract" class="solr.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.content">content</str> <str name="lowernames">true</str> <str name="uprefix">attr_</str> <str name="captureAttr">false</str...