colly is one of the available projects on GitHub that covers most of the systems discussed earlier. This project is built to run on a single machine, due to its reliance on a local cache and queuing system.
The main worker object in colly, the Collector, is built to run in its own goroutine, allowing you to run multiple Collectors simultaneously. This design offers you the ability to scrape from multiple sites at the same time with different parameters, such as crawl delays, white and blacklists, and proxies.
colly is built to only process HTML and XML files. It does not offer support for JavaScript execution. However, you would be surprised at how much information you can collect with pure HTML. The following example is adapted from the GitHub README:
package main
import (
"github.com/gocolly/colly"
"fmt"
)
func main() {
c ...