One of the benefits of deep learning is that it largely removes the need for feature engineering, which you may be used to with machine learning. That being said, the data still needs to be prepared before we begin modeling. Let's review the following goals to prepare data for modeling:
- Remove no-information and extremely low-information variables
- Identify dates and extract date parts
- Handle missing values
- Handle outliers
In this chapter, we will be investigating air quality data using data provided by the London Air Quality Network. Specifically, we will look at readings for nitrogen dioxide in the area of Tower Hamlets (Mile End Road) during 2018. This is a very small dataset with only a few features and approximately 35,000 observations. We are using a limited dataset here so that all of our code, even our modeling, runs quickly. That said...