Building the graph
In the preceding sections, you installed GraphFrames and built the DataFrames required for the graph; now, you can start building the graph itself.
How to do it...
The first component of this recipe involves importing the necessary libraries, in this case, the PySpark SQL functions (pyspark.sql.functions
) and GraphFrames (graphframes
). In the previous recipe, we had created the src
and dst
columns as part of creating the deptsDelays_geo
DataFrame. When creating edges within GraphFrames, it is specifically looking for the src
and dst
columns to create the edges as per edges
. Similarly, GraphFrames is looking for the column id
to represent the graph vertex (as well as join to the src
and dst
columns). Therefore, when creating the vertexes, vertices
, we rename the IATA
column to id
:
from pyspark.sql.functions import * from graphframes import * # Create Vertices (airports) and Edges (flights) vertices = airports.withColumnRenamed("IATA", "id").distinct() edges = deptsDelays_geo...