Back in Chapter 8, Graph-Based Data Processing, we designed and built our very own system for implementing graph-based algorithms based on the Bulk Synchronous Parallel (BSP) model. Admittedly, our final implementation was heavily influenced by the ideas from the Google paper describing Pregel [4], a system that was originally built by Google engineers to tackle graph-based computation at scale.
While the bspgraph package from Chapter 8, Graph-Based Data Processing, can automatically distribute the graph computation load among a pool of workers, it is still limited to running on a single compute node. As our Links 'R' Us crawler augments our link index with more and more links, we will eventually reach a point where the PageRank computation will simply take too long. Updating the PageRank scores for the entire graphs might take...