Now that you have seen the progression of building fully featured web scrapers, I would like to introduce you to the most complete web scraping project in Go that has been built today. dataflowkit, by GitHub user slotix, is a fully featured web scraper that is modular and extensible for building scalable, large-scale distributed applications. It allows for multiple backends for storage of cached and computed information and is capable of both simple HTTP requests as well as driving browsers through the DevTools Protocol. Going above and beyond, dataflowkit has both a command-line interface and a JSON format to declare web scraping scripts.
The architecture of dataflowkit is separated into two distinct parts: fetching and parsing. Both Fetch and Parse phases of the system are built as separate binaries to be run on different machines. They...