While discussing NLTK, we highlighted the significance of a corpus or large repository of text for NLP research. While the available corpora are quite useful, NLP researchers may require the text of a particular subject. For example, someone trying to build a sentiment analyzer for financial markets may not find the available corpus (presidential speeches, movie reviews, and so on) particularly useful. Consequently, NLP researchers may have to get data from other sources. Web scraping is an extremely useful tool in this regard as it lets users retrieve information from web sources programmatically.
Before we start discussing web scraping, we wish to underscore the importance of complying with the respective website policies on web scraping. Most websites allow web scraping for individual non-commercial use, but you must always confirm the policy before scraping a website.
To perform web scraping, we will be using a test website (https://webscraper...