So far, this book has focused on retrieving information for a single web page. Although this is the basis of web scraping, it does not cover the majority of use cases. More than likely, you will need to visit multiple web pages, or websites, in order to collect all of the information to fulfill your needs. This may entail visiting many known websites directly via a list or URLs, or following links discovered in some pages to more unknown places. There are many different ways of navigating your scraper through the web.
In this chapter, we will cover the following topics:
- How to follow links
- How to submit forms with POST requests
- How to track your history to avoid loops
- The difference between breadth-first and depth-first crawling