Scraping information from websites
Governments or jurisdictions around the world are increasingly embracing the importance of open data, which aims to increase citizen involvement and informs about decision making, making policies more open to public scrutiny. Some examples of open data initiatives around the world include data.gov
(United States of America), data.gov.uk
(United Kingdom), and data.gov.hk
(Hong Kong).
These data portals often provide Application Programming Interfaces (APIs; see Chapter 7, Visualizing Online Data, for more details) for programmatic access to data. However, APIs are not available for some datasets; hence, we resort to good old web scraping techniques to extract information from websites.
BeautifulSoup (https://www.crummy.com/software/BeautifulSoup/) is an incredibly useful package used to scrape information from websites. Basically, everything marked with an HTML tag can be scraped with this wonderful package, from text, links, tables, and styles, to images...