Preparing your flights dataset
For this flights sample scenario, we will make use of two sets of data:
Airline On-Time Performance and Causes of Flight Delays: [http://bit.ly/2ccJPPM] This dataset contains scheduled and actual departure and arrival times, and delay causes as reported by US air carriers. The data is collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS).
Open Flights: Airports and airline data: [http://openflights.org/data.html] This dataset contains the list of US airport data including the IATA code, airport name, and airport location.
We will create two DataFrames – airports
and departureDelays
–which will make up our vertices and edges of our GraphFrame, respectively. We will be creating this flights sample application using Python.
As we are using a Databricks notebook for our example, we can make use of the /databricks-datasets/
location, which contains numerous sample datasets. You can also download the data from:
depa
rtureDelays.csv...