4. Collecting Text Data with Web Scraping and APIs
Overview
This chapter introduces you to the concept of web scraping. You will first learn how to extract data (such as text, images, lists, and tables) from pages that are written using HTML. You will then learn about the various types of semi-structured data used to create web pages (such as JSON and XML) and extract data from them. Finally, you will use APIs for data extraction from Twitter, using the tweepy
package.