Introducing web scraping
First, what even is web scraping, and who can do it? Anyone with any programming skill can do scraping using several different programming languages, but we will do this with Python. Web scraping is the action of harvesting content from web resources so that you may use the data in your products and software. You can use scraping to pull information that a website hasn’t exposed as a data feed or through an API. But one warning: do not scrape too aggressively; otherwise, you could knock down a web server through an accidental denial-of-service (DoS) attack. Just get what you need as often as you need it. Go slow. Don’t be greedy or selfish.
Introducing BeautifulSoup
BeautifulSoup is a powerful Python library for scraping anything that you have access to online. I frequently use this to harvest story URLs from news websites, and then I scrape each of these URLs for their text content. I typically do not want the actual HTML, CSS, or JavaScript...