Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Go Web Scraping Quick Start Guide

You're reading from   Go Web Scraping Quick Start Guide Implement the power of Go to scrape and crawl data from the web

Arrow left icon
Product type Paperback
Published in Jan 2019
Publisher Packt
ISBN-13 9781789615708
Length 132 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vincent Smith Vincent Smith
Author Profile Icon Vincent Smith
Vincent Smith
Arrow right icon
View More author details
Toc

Distributed scraping with dataflowkit

Now that you have seen the progression of building fully featured web scrapers, I would like to introduce you to the most complete web scraping project in Go that has been built today. dataflowkit, by GitHub user slotix, is a fully featured web scraper that is modular and extensible for building scalable, large-scale distributed applications. It allows for multiple backends for storage of cached and computed information and is capable of both simple HTTP requests as well as driving browsers through the DevTools Protocol. Going above and beyond, dataflowkit has both a command-line interface and a JSON format to declare web scraping scripts.

The architecture of dataflowkit is separated into two distinct parts: fetching and parsing. Both Fetch and Parse phases of the system are built as separate binaries to be run on different machines. They...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime