Implementing the PageRank algorithm
As an example, we will use the PageRank algorithm. It is a centrality metric developed by Larry Page, Google’s co-founder, to rank results on the search engine.
In this section, we will dig into the algorithm’s mechanisms and work on a simple implementation using Python before implementing the algorithm in Java, leveraging the Pregel API.
The PageRank algorithm
This algorithm is based on the following assumptions:
- The more connections you have, the more important you are.
- Not all connections share the same weight. For example, let’s say a backlink from the New York Times is driving more traffic to your website than a backlink from a less popular website. Scores are propagated from neighbors to account for the neighbor’s importance.
- At the same time, links from a website with fewer links show more relevance. Imagine that there’s a Wikipedia article linking every single noun to the corresponding...