Now that Scrapy is installed, we can run the startproject command to generate the default structure for our first Scrapy project.
To do this, open the terminal and navigate to the directory where you want to store your Scrapy project, and then run scrapy startproject <project name>. Here, we will use example for the project name:
$ scrapy startproject example
$ cd example
Here are the files generated by the scrapy command:
scrapy.cfg
example/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
spiders/
__init__.py
The important files for this chapter (and in general for Scrapy use) are as follows:
- items.py: This file defines a model of the fields that will be scraped
- settings.py: This file defines settings, such as the user agent and crawl delay
- spiders/: The actual scraping and crawling code are stored...