Modeling and evaluation
With the data prepared, we will begin the modeling process. For comparison purposes, we will create a model using best subsets regression like the previous two chapters and then utilize the regularization techniques.
Best subsets
The following code is, for the most part, a rehash of what we developed in Chapter 2, Linear Regression – The Blocking and Tackling of Machine Learning. We will create the best subset object using the regsubsets()
command and specify the train
portion of data
. The variables that are selected will then be used in a model on the test
set, which we will evaluate with a mean squared error calculation.
The model that we are building is written out as lpsa~.
with the tilda and period stating that we want to use all the remaining variables in our data frame with the exception of the response, as follows:
> subfit = regsubsets(lpsa~., data=train)
With the model built, you can produce the best subset with two lines of code. The first one turns...