As you begin to add more and more target websites into your scraping requirements, you will eventually hit a point where you wish you could make more calls, faster. In a single program, the crawl delay might add extra time to your scraper, adding unnecessary time to process the other sites. Do you see the problem in the following diagram?
![](https://static.packt-cdn.com/products/9781789615708/graphics/assets/49cfec0e-bab5-4c02-ae06-151795e343f5.png)
If these two sites could be run in parallel, there would not be any interference. Maybe the time to access and parse a page is longer than the crawl delay for this website, and launching a second request before the processing of the first response completes could save you time as well. Look how the situation is improved in the following diagram:
![](https://static.packt-cdn.com/products/9781789615708/graphics/assets/29436d6c-1b52-402c-b32b-0b8dff8dfb14.png)
In any of these cases, you will need to introduce concurrency into your web scraper.
In this chapter, we will cover the following topics:
- What is concurrency
- Concurrency pitfalls...