Sequential crawler
Here is the code to use AlexaCallback
with the link crawler developed earlier to download sequentially:
scrape_callback = AlexaCallback() link_crawler(seed_url=scrape_callback.seed_url, cache_callback=MongoCache(), scrape_callback=scrape_callback)
This code is available at https://bitbucket.org/wswp/code/src/tip/chapter04/sequential_test.py and can be run from the command line as follows:
$ time python sequential_test.py ... 26m41.141s
This time is as expected for sequential downloading with an average of ~1.6 seconds per URL.