Why cover Requests and BeautifulSoup?
We all like it when things are easy, but life is challenging, and things don’t always work out the way we want. In scraping, that’s laughably common. Initially, you can count on more things going wrong than going right, but if you are persistent and know your options, you will eventually get the data that you want.
In the previous chapter, we covered the Requests
Python library, because this gives you the ability to access and use any publicly available web data. You get a lot of freedom when working with Requests
. This gives you the data, but making the data useful is difficult and time-consuming. We then used BeautifulSoup
, because it is a rich library for dealing with HTML. With BeautifulSoup
, you can be more specific about the kinds of data you extract and use from a web resource. For instance, we can easily harvest all of the external links from a website, or even the full text of a website, excluding all HTML.
However...