Introduction to GraphX
As per the Apache Spark documentation: "GraphX is Apache Spark's API for graphs and graph-parallel computation". Graph based computations have become very popular with the advancement of technologies. Whether it is finding the shortest path between two points, matching DNA, or social media, graph computations have become ubiquitous.
Graph consists of a vertex and edges, where a vertex defines entities or nodes and edges defines the relationships from entities. Edges can be one directional or bidirectional based on the requirement. For example, an edge describing friendship relations between two users on Facebook is bidirectional; however, an edge describing follower relations between two users on Twitter may or may not be bidirectional because one can follow another person on Twitter without being followed by that person.
The Spark Graphx
library helps to run graph-based computations on top of Spark. It provides a graph-based abstraction on the Spark RDD called a Property...