Chapter 9: Data Modeling – Preprocessing
In this chapter, you will learn two important processes used to prepare data for modeling – splitting and scaling. You will learn how to use the sklearn
methods – .StandardScaler
and .MinMaxScaler
for scaling, and .train_test_split
for splitting. You will also be introduced to the reasons behind scaling and exactly what these methods do. As part of exploring splitting and scaling, you will use sklearn
LinearRegression
and statsmodels
to create simple linear regression models.
By the end of this chapter, you will be comfortable preparing datasets to begin modeling. The main ideas you will learn in this chapter are as follows:
- Exploring independent and dependent variables
- Understanding data scaling and normalization
- Activity 9.01 – Data splitting, scaling, and modeling