First approach: scripting
Now, let's start writing the script. We'll go through the source in three steps: imports, argument parsing, and business logic.
The imports
Here's how the script starts:
# scrape.py
import argparse
import base64
import json
from pathlib import Path
from bs4 import BeautifulSoup
import requests
Going through the imports from the top, you can see that we'll need to parse the arguments, which we'll feed to the script itself (using argparse
). We will need the base64
library to save the images within a JSON file (so we will also need json
), and we'll need to open files for writing (using pathlib
). Finally, we'll need BeautifulSoup
for scraping the web page easily, and requests
to fetch its content. We assume you're familiar with requests
as we used it in Chapter 8, Files and Data Persistence.
We will explore the HTTP protocol and the requests mechanism in Chapter 14, Introduction to API Development...