Installing GraphFrames
Under the hood of GraphFrames are two Spark DataFrames: one for the vertices and other one for the edges. GraphFrames might be thought of as the next generation of Spark's GraphX library, with some major improvements over the latter:
- GraphFrames leverages the performance optimizations and simplicity of the DataFrame API.
- By using the DataFrame API, GraphFrames can be interacted with through Python, Java, and Scala APIs. In contrast, GraphX was only available through the Scala interface.
You can find the latest information on GraphFrames within the GraphFrames overview at https://graphframes.github.io/.
Getting ready
We require a working installation of Spark. This means that you would have followed the steps outlined in Chapter 1, Installing and Configuring Spark. As a reminder, to start the PySpark shell for your local Spark cluster, you can run the following command:
./bin/pyspark --master local[n]
Where n
is the number of cores.
How to do it...
If you are running your...