Measuring performance in regression models
Let's make it more technical and discuss the most common metrics available when dealing with regression models. First of all, let's recall what a regression is: we are trying to explain here a response variable with a set of explanatory variables. A reasonable model performance metric will therefore be one that summarizes how well our model is able to explain the explanatory variable itself.
It is no surprise that the most popular regression model metrics are both able to explain this:
- Mean squared error
- R-squared
Both of them are based on the concept of error, which we already encountered when dealing with model coefficient estimation. We defined as error for a given record of the estimation dataset the difference between the actual value of the response variable and the value of the response value we estimate with our model:
We also called this residual, and employed it to test some of the most relevant assumptions regarding linear regression models...