First approach – scripting
Now, let's start writing the script. I'll go through the source in three steps: imports first, then the argument parsing logic, and finally the business logic.
The imports
scrape.py (Imports)
import argparse import base64 import json import os from bs4 import BeautifulSoup import requests
Going through them from the top, you can see that we'll need to parse the arguments. which we'll feed to the script itself (argparse
). We will need the base64
library to save the images within a JSON file (base64
and json
), and we'll need to open files for writing (os
). Finally, we'll need BeautifulSoup
for scraping the web page easily, and requests
to fetch its content. requests
is an extremely popular library for performing HTTP requests, built to avoid the difficulties and quirks of using the standard library urllib
module. It's based on the fast urllib3
third-party library.
Note
We will explore the HTTP protocol and requests
mechanism in Chapter 10, Web Development Done Right...