Executing simple queries
Let's start with a set of simple graph queries to understand flight performance and departure delays.
Determining the number of airports and trips
For example, to determine the number of airports and trips, you can run the following commands:
print "Airports: %d" % tripGraph.vertices.count() print "Trips: %d" % tripGraph.edges.count()
As you can see from the results, there are 279 airports with 1.36 million trips:
Determining the longest delay in this dataset
To determine the longest delayed flight in the dataset, you can run the following query with the result of 1,642 minutes (that's more than 27 hours!):
tripGraph.edges.groupBy().max("delay") # Output +----------+ |max(delay)| +----------+ | 1642| +----------+
Determining the number of delayed versus on-time/early flights
To determine the number of delayed versus on-time (or early) flights, you can run the following queries:
print "On-time / Early Flights: %d" % tripGraph.edges.filter("delay <= 0").count() print...