Let's dive right into the dataset. The instructions to download the NYC taxi fares dataset can be found in the accompanying GitHub repository for the book (refer to the Technical requirements section). Unlike in the previous chapter, Chapter 2, Predicting Diabetes with Multilayer Perceptrons, we're not going to import the original dataset of 55 million rows. In fact, most computers would not be able to store the entire dataset in memory! Instead, let's just import the first 0.5 million rows. Doing this does have its drawbacks, but it is a necessary tradeoff in order to use the dataset in an efficient manner.
To do this, run the read_csv() function with pandas:
import pandas as pd
df = pd.read_csv('NYC_taxi.csv', parse_dates=['pickup_datetime'], nrows=500000)
The parse_dates parameter in read_csv allows pandas to easily...