Scrapy setup and the application code
Scrapy is a Python library is used to extract content from web pages or to crawl pages linked to a given web page (see the Web crawlers (or spiders) section of Chapter 4, Web Mining Techniques, for more details). To install the library, type the following in the terminal:
sudo pip install Scrapy
Install the executable in the bin
folder:
sudo easy_install scrapy
From the movie_reviews_analyzer_app
folder, we initialize our Scrapy project as follows:
scrapy startproject scrapy_spider
This command will create the following tree inside the scrapy_spider
folder:
├── __init__.py ├── items.py ├── pipelines.py ├── settings.py ├── spiders ├── spiders │ ├── __init__.py
The pipelines.py
and items.py
files manage how the scraped data is stored and manipulated, and they will be discussed later in the Spiders and Integrate Django with Scrapy sections. The settings.py
file sets the parameters each spider (or crawler) defined in the spiders
folder uses to operate...