An example dynamic web page
Let's look at an example dynamic web page. The example website has a search form, which is available at http://example.webscraping.com/search, that is used to locate countries. Let's say we want to find all the countries that begin with the letter A:
data:image/s3,"s3://crabby-images/fe5f4/fe5f4c1ff14e7a058abde50a84b598b4c7c5af5c" alt="An example dynamic web page"
If we right-click on these results to inspect them with Firebug (as covered in Chapter 2, Scraping the Data), we would find that the results are stored within a div
element of ID "results":
data:image/s3,"s3://crabby-images/98173/981737d1c3a6987836d58da3a02b52cb397aab74" alt="An example dynamic web page"
Let's try to extract these results using the lxml
module, which was also covered in Chapter 2, Scraping the Data, and the Downloader
class from Chapter 3, Caching Downloads:
>>> import lxml.html >>> from downloader import Downloader >>> D = Downloader() >>> html = D('http://example.webscraping.com/search') >>> tree = lxml.html.fromstring(html) >>> tree.cssselect('div#results a') []
The example scraper here has failed to extract results...