Part 1 – Loading the US domestic flight data into a graph
To initialize the Notebook, let's run the following code, in its own cell, to import the packages which we'll be using quite heavily in the rest of this chapter:
import pixiedust import networkx as nx import pandas as pd import matplotlib.pyplot as plt
We'll also be using the 2015 Flight Delays and Cancellations dataset available on the Kaggle website at this location: https://www.kaggle.com/usdot/datasets. The dataset is composed of three files:
airports.csv
: List of all U.S. airports including their IATA code (International Air Transport Association: https://openflights.org/data.html), city, state, longitude, and latitude.airlines.csv
: List of U.S. airlines including their IATA code.flights.csv
: List of flights that occurred in 2015. This data includes date, origin and destination airports, scheduled and actual times, and delays.
The flights.csv
file contains close to 6 million records, which need to be cleaned up to remove all flights...