Modeling and evaluation
With the data prepared, we will begin the modeling process. For comparison purposes, we will create a model using best subsets regression like the previous two chapters and then utilize the regularization techniques.
Best subsets
The following code is, for the most part, a rehash of what we developed in Chapter 2, Linear Regression - The Blocking and Tackling of Machine Learning. We will create the best subset object using the regsubsets()
command and specify the train
portion of data
. The variables that are selected will then be used in a model on the test
set, which we will evaluate with a mean squared error calculation.
The model that we are building is written out as lpsa ~ .
with the tilde and period stating that we want to use all the remaining variables in our data frame, with the exception of the response:
> subfit <- regsubsets(lpsa ~ ., data = train)
With the model built, you can produce the best subset with two lines of code. The first one turns the summary...