Chapter 4: Regression
Activity 7: Printing Various Attributes Using Model Object Without Using the summary Function
First, print the coefficient values using the following command. Make sure the output is like the output of the summary function using the coefficients option. The coefficients are the fitted values from the model that uses the OLS algorithm:
multiple_PM25_linear_model$coefficients
The output is as follows:
(Intercept) DEWP TEMP Iws 161.1512066 4.3841960 -5.1335111 -0.2743375
Find the residual value (difference) of the predicted and actual values of PM2.5, which should be as small as possible. Residual reflects how far the fitted values using the coefficients are from the actual value.
multiple_PM25_linear_model$residuals
The output is as follows:
25 26 27 28 17.95294914 32.81291348 21.38677872 26.34105878 29 30 31 32
Next, find the fitted values that should be closer to the actual PM2.5 for the best model. Using the coefficients, we can compute the fitted values:
multiple_PM25_linear_model$fitted.values
The output is as follows:
25 26 27 28 29 111.047051 115.187087 137.613221 154.658941 154.414781 30 31 32 33 34
Find the R-Squared values. They should look the same as the one you obtained in the output of the summary function next to the text Multiple R-squared. R-Square helps in evaluating the model performance. If the value is closer to 1, the better the model is:
summary(multiple_PM25_linear_model)$r.squared
The output is as follows:
[1] 0.2159579
Find the F-Statistic values. Make sure the output should look same as the one you obtained in the output of the summary function next to the text F-Statistics. This will tell you if your model fits better than just using the mean of the target variable. In many practical applications, F-Statistic is used along with p-values:
summary(multiple_PM25_linear_model)$fstatistic
The output is as follows:
value numdf dendf 3833.506 3.000 41753.000
Finally, find the coefficient p-values and make sure the values should look the same as the one you obtained in the output of the summary function under Coefficients for each variable. It will be present under the column titled Pr(>|t|):. If the value is less than 0.05, the variable is statistically significant in predicting the target variable:
summary(multiple_PM25_linear_model)$coefficients[,4]
The output is as follows:
(Intercept) DEWP TEMP Iws 0.000000e+00 0.000000e+00 0.000000e+00 4.279601e-224
The attributes of a model are equally essential to understand, especially in linear regression than to obtain the prediction. They help in interpreting the model well and connect the problem to its real use case.