Summary
In this chapter, we worked through the process of pulling tables from Wikipedia using web scraping techniques, cleaning up the resulting data with pandas, and producing a final analysis.
We started by looking at how HTTP
requests work, focusing on GET
requests and their response status codes. Then, we went into the Jupyter Notebook and made HTTP requests with Python using the requests
library. We saw how Jupyter can be used to render HTML in the notebook, along with actual web pages that can be interacted with. In order to learn about web scraping, we saw how BeautifulSoup
can be used to parse text from the HTML, and used this library to scrape tabular data from Wikipedia.
After pulling two tables of data, we processed them for analysis with pandas. The first table contained the central bank interest rates for each country, while the second table contained the populations. We combined these into a single table that was then used for the final analysis, which involved...