The Requests and BeautifulSoup Libraries
We will take advantage of two Python libraries in this chapter: requests
and BeautifulSoup
. To avoid dealing with HTTP methods at a lower level, we will use the requests
library. It is an API built on top of pure Python web utility libraries, which makes placing HTTP requests easy and intuitive.
BeautifulSoup
is one of the most popular HTML parser packages. It parses the HTML content you pass on and builds a detailed tree of all the tags and markup within the page for easy and intuitive traversal. This tree can be used by a programmer to look for certain markup elements (for example, a table, a hyperlink, or a blob of text within a particular div
ID) to scrape useful data.
We are going to do a couple of exercises in order to demonstrate how to use the requests
library and decode the contents of the response received when data is fetched from the server.