Fitting a multiple linear model with R
It is now time to apply all that we have seen until now to our data. First of all, we are going to fit our model, applying the previously introduced lm()
function. This will not require too much time, and will directly lead us to model assumptions validation, both on multicollinearity and residual behavior. We will finally, for the best possible model, apply both stepwise regression and principal component regression.
Model fitting
Let us define the dataset we are going to employ for our modeling activity. We will employ clean_casted_stored_data_validated_complete
, removing the default_flag
and the customer_code
first because it is actually meaningless as an explanatory variable:
clean_casted_stored_data_validated_complete %>% (-default_flag) %>% (-customer_code) -> training_data
And we are ready now to fit our model:
multiple_regression <- lm(as.numeric(default_numeric)~., data= training_data)
You should have already noticed the small point...