Installing Apache Nutch
Apache Nutch comes in two versions (1.x and 2.x). For this example, we'll be using version 1.x, as it contains a binary that will help reduce the time taken to build version 2.x from scratch. The latest stable version of Apache Nutch (v1.10), which also contains a binary at the time of writing this book, can be installed by following these steps:
Download and unzip Apache Nutch (
apache-nutch-1.10-bin.tar.gz
) from http://nutch.apache.org/downloads.html.Extract the archive file into a folder of your choice. We'll use
%NUTCH_HOME%
as the folder where the ZIP file is to be extracted.
Note
On Windows, we can install Cygwin by going to the installation link at http://cygwin.com/install.html.
Let's verify the downloaded archive by going to %NUTCH_HOME%/bin
. It will contain the Nutch
script, which we can execute. We run the following command to get a list of available options that we can use:
$ cd %NUTCH_HOME%/bin $ ./nutch
We should get the following output from the command:
Usage...