Technical requirements
The Google Chrome or Mozilla Firefox web browser will be required and we will be using Python notebooks with JupyterLab.
Please refer to the Setting things up and Creating a virtual environment sections of Chapter 2 and continue using the environment we created.
The Python libraries that are required for this chapter are as follows:
lxml
urllib
The code files for this chapter are available online on GitHub: https://github.com/PacktPublishing/Hands-On-Web-Scraping-with-Python-Second-Edition/tree/main/Chapter03.
Important note
Web- or website-based content refers to the responses or page sources that are received after processing requests to a URL. Content can be of various types, such as PDF, CSV, TXT, XML, HTML, and JSON. In general, in this chapter, we are talking about HTML, page source, or markup documents as our primary content, unless stated otherwise.