Running queries against the graph
Now that you have created your graph, start off by creating and running some simple queries against your GraphFrame.
Getting ready
Ensure that you have created the graph
GraphFrame (derived from the vertices
and edges
DataFrames) from the previous section.
How to do it...
Let's start with some simple count queries to determine the number of airports (nodes or vertices; remember?) and the number of flights (the edges), which can be determined by applying count()
. The call to count()
is similar to a DataFrame except that you also need to include whether you are countingvertices
oredges
:
print "Airport count: %d" % graph.vertices.count() print "Trips count: %d" % graph.edges.count()
The output of these queries should be similar to the following output, denoting the 279 vertices (that is, airports) and more than 1.3 million edges (that is, flights):
Output: Airports count: 279 Trips count: 1361141
Similar to DataFrames, you can also execute the filter
and groupBy...