Web Scraping
In this final chapter of the Dealing with Data part of the book, we'll be learning how to collect data from web sources. This includes using Python modules and packages to scrape data straight from webpages and Application Programming Interfaces (APIs). We'll also learn how to use so-called "wrappers" around APIs to collect and store data. Since new, fresh data is being created every day on the internet, web scraping opens up huge opportunities for data collection.
In this chapter, we'll cover the following:
- Understanding the structure of the internet
- Performing simple web scraping
- Parsing HTML from scraped pages
- Using APIs to collect data
- The ethics and legality of web scraping
We'll learn these topics using the following Python packages and modules:
urllib
beautifulsoup4
lxml
requests
- Selenium
praw
(a Reddit API wrapper)
Web scraping is a fun...