Now that we understand the structure of this web page we will investigate three different approaches to scraping its data, first with regular expressions, then with the popular BeautifulSoup module, and finally with the powerful lxml module.
Three approaches to scrape a web page
Regular expressions
If you are unfamiliar with regular expressions or need a reminder, there is a thorough overview available at https://docs.python.org/3/howto/regex.html. Even if you use regular expressions (or regex) with another programming language, I recommend stepping through it for a refresher on regex with Python.
Because each chapter might build or use parts of previous chapters, we recommend setting up your file structure similar to that in the book repository. All code can then...