4. Collecting Text Data with Web Scraping and APIs
Activity 4.01: Extracting Information from an Online HTML Page
Solution
Let's extract the data from an online source and analyze it. Follow these steps to implement this activity:
- Open a Jupyter Notebook.
- Import the
requests
andBeautifulSoup
libraries. Pass the URL torequests
with the following command. Convert the fetched content into HTML format using theBeautifulSoup
HTML parser. Add the following code to do this:import requests from bs4 import BeautifulSoup r = requests\ .get('https://en.wikipedia.org/wiki/Rabindranath_Tagore') r.status_code soup = BeautifulSoup(r.text, 'html.parser')
- To extract the list of headings, see which HTML elements belong to each bold headline in the Works section. You can see that they belong to the
h3
tag. We only need the first six headings here. Look for aspan
tag that has aclass
attribute with the following set of commands...