Visual scraping with Portia
Portia is a an open-source tool built on top of Scrapy that supports building a spider by clicking on the parts of a website that need to be scraped, which can be more convenient than creating the CSS selectors manually.
Installation
Portia is a powerful tool, and it depends on multiple external libraries for its functionality. It is also relatively new, so currently, the installation steps are somewhat involved. In case the installation is simplified in future, the latest documentation can be found at https://github.com/scrapinghub/portia#running-portia.
The recommended first step is to create a virtual Python environment with virtualenv
. Here, we name our environment portia_example
, which can be replaced with whatever name you choose:
$ pip install virtualenv $ virtualenv portia_example --no-site-packages $ source portia_example/bin/activate (portia_example)$ cd portia_example
Note
Why use virtualenv?
Imagine if your project was developed with an earlier version...