Performing simple web scraping
There are a few ways to perform GET and POST requests in Python, and we'll execute these with the urllib
and requests
libraries here.
Using urllib
The simplest way to web scrape in Python is with built-in modules from urllib
. With this library, we can easily download the content of a webpage or a file. For example, let's download the Wikipedia page for general-purpose programming languages and print out the first bit:
from urllib.request import urlopen
url = 'https://en.wikipedia.org/wiki/General-purpose_programming_language'
page = urlopen(url).read()
print(page[:50])
print(page[:50].decode('utf-8'))
We import the urlopen
function from the urllib.request
module (which is a Python file in the Python standard library). Then, we use the function by providing a URL as a string. There are other arguments we can provide, like data
, but those are for usage with POST requests. Notice at the end of the urlopen(...