Web scraping using Beautiful Soup
In this section, we will build and execute a web crawler using Beautiful Soup. To set things up, we have chosen to scrape quotes from http://quotes.toscrape.com. Specifically, we will be scraping from the page http://quotes.toscrape.com/tag/inspirational, as seen in Figure 5.3:
data:image/s3,"s3://crabby-images/721d1/721d1bd9c0a4d670b7a9ae02d79520e85ce97c70" alt="Figure 5.3: Category “inspirational”"
Figure 5.3: Category “inspirational”
The example we are dealing with is similar to Example 3 – scraping quotes with author details in Chapter 4. Only the links and compositions have changed, by carrying out a few additional logical steps. The code for the example can be found on GitHub: https://github.com/PacktPublishing/Hands-On-Web-Scraping-with-Python-Second-Edition/blob/main/Chapter05/bs4_scraping.ipynb.
The following code declares the paginated link as url
, and columns
contains a column header for the CSV file to be generated:
url = "http://quotes.toscrape.com/tag/inspirational/page/" columns=['id&apos...