In this section, we'll start learning how to extract links, and then we'll use them to make the crawler recursive. Now that we have created the basic structure of a crawler, let's add some functionality:
- First, let's copy the prepared spiderman.py file for this exercise. Copy it from examples/spiders/spiderman-recursive.py to basic_crawler/basic_crawler/spiders/spiderman.py.
- Then, go back to our editor. As we would like to make the crawler recursive, for this purpose, we will once again work the spiderman.py file and start with adding another extractor. However, this time we'll add the links instead of titles, as highlighted in the following screenshot:
- Also, we need to make sure that the links are valid and complete, so we'll create a regular expression that will validate links highlighted in the following screenshot...