Introducing GraphFrames
While the GraphX framework is based on the RDD API, GraphFrames is an external Spark package built on top of the DataFrames API. It inherits the performance advantages of DataFrames using the catalyst optimizer. It can be used in the Java, Scala, and Python programming languages. GraphFrames provides additional functionalities over GraphX such as motif finding, DataFrame-based serialization, and graph queries. GraphX does not provide the Python API, but GraphFrames exposes the Python API as well.
It is easy to get started with GraphFrames. On a Spark 2.0 cluster, let's start a Spark shell with the packages option using the same data used to create the graph in the Creating a graph section of this chapter:
$SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.2.0-spark2.0-s_2.11 import org.graphframes._ val vertex = spark.createDataFrame(List( ("1","Jacob",48), ("2","Jessica",45), ("3","Andrew",25), ("4","Ryan",53), ("5","Emily",22)...