Understanding the structure of the internet
Before undertaking web scraping, it's important to have a rudimentary understanding of how the internet works. Most of us simply interact through web browsers and don't see all the details behind webpages, but behind the images and text that we see in our browser, there's a lot of complex code and data exchange happening.
When we visit a webpage, we type in a web address in our browser address bar and ask for a file from a remote server. The file is returned to us and displayed through our browser. The web addresses we use are URLs, or Uniform Resource Locators. An example we'll use here is
https://subscription.packtpub.com/book/IoT-and-Hardware/9781789958034
which is a page for the book MicroPython Projects, by Jacob Beningo. URLs follow a pattern, which we can use when web scraping:
[scheme]://[authority][path_to_resource]?[parameters]
In our example:
- Scheme –
https
- Authority...