Using classical linear regression
In this section, we will specify a fairly straightforward linear model. We will use it to predict the implied gasoline tax of a country based on several national economic and political measures. But before we specify our model, we need to do the pre-processing tasks we discussed in the first few chapters of this book.
Pre-processing the data for our regression model
We will use pipelines to pre-process our data in this chapter, and throughout the rest of this book. We need to impute values where they are missing, identify and handle outliers, and encode and scale our data. We also need to do this in a way that avoids data leakage and cleans the training data without peeking ahead to the testing data. As we saw in Chapter 6, Preparing for Model Evaluation, scikit-learn’s pipelines can help with these tasks.
The dataset we will use contains the implied gasoline tax for each country and some possible predictors, including national income...