Configuring Solr with Nutch
Apache Solr can easily be configured for use with Nutch. We can perform the following steps to integrate Apache Nutch with Solr:
Create a new core (
nutch-example
) in Solr by copying thenutch-example
folder from theChapter 7
code that comes with this book.After creating the new core, we just need to restart the Solr instance.
After we have restarted the Solr instance, let's crawl some data using Nutch and index it into Solr. To do this, we'll navigate to the
%NUTCH_HOME%
folder and execute the following command:$ bin/crawl
After executing the command, we'll see the following output:
Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds> -i|--index Indexes crawl results into a configured indexer -D A Java property to pass to Nutch calls Seed Dir Directory in which to look for a seeds file Crawl Dir Directory where the crawl/link/segments dirs are saved ...