Requesting Content from Web Pages
Whenever you visit a web page from your web browser, you actually send a request to fetch its content. This can be done using Python scripts. Packages such as urllib3 and requests are used to do so. Let's look at an exercise to get a better understanding of this concept.
Exercise 41: Collecting Online Text Data
In this exercise, we will collect online data, with the help of requests
and urllib3
. Follow these steps to implement this exercise:
- Use the
requests
library to request the content of a book available online with the following set of commands:import requests r = requests.post('https://www.gutenberg.org/files/766/766-0.txt') r.status_code
The preceding code generates the following output:
Figure 4.10: HTTP status code
Note
Here, 200 indicates that we received a proper response from the URL.
- To locate the text content of the fetched file, write the following code:
r.text[:1000]
The preceding code generates the following...