Starting a project
Now that Scrapy is installed, we can run the startproject
command to generate the default structure for this project. To do this, open the terminal and navigate to the directory where you want to store your Scrapy project, and then run scrapy startproject <project name>
. Here, we will use example
for the project name:
$ scrapy startproject example $ cd example
Here are the files generated by the scrapy
command:
scrapy.cfg example/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py
The important files for this chapter are as follows:
items.py
: This file defines a model of the fields that will be scrapedsettings.py
: This file defines settings, such as the user agent and crawl delayspiders/
: The actual scraping and crawling code are stored in this directory
Additionally, Scrapy uses scrapy.cfg
for project configuration and pipelines.py
to process the scraped fields, but they will not...