We have accomplished a lot in this chapter. Let's do a quick recap of the code that we have written so far.
We started off by defining a function for preprocessing. This preprocess function takes a DataFrame as an input and performs the following actions:
- Removing missing values
- Removing outliers in the fare amount
- Replacing outliers in passenger count with the mode
- Removing outliers in latitude and longitude (that is, only considering points within NYC)
This function is saved under utils.py in our project folder.
Next, we also defined a feature_engineer function for feature engineering. This function takes a DataFrame as an input and performs the following actions:
- Creating new columns for year, month, day, day of the week, and hour
- Creating new column for the Euclidean distance between the pickup and drop off points
- Creating new columns for the...