Computing regression accuracy
Now that we know how to build a regressor, it's important to understand how to evaluate the quality of a regressor as well. In this context, an error is defined as the difference between the actual value and the value that is predicted by the regressor.
Getting ready
Let's quickly understand what metrics can be used to measure the quality of a regressor. A regressor can be evaluated using many different metrics, such as the following:
- Mean absolute error: This is the average of absolute errors of all the datapoints in the given dataset.
- Mean squared error: This is the average of the squares of the errors of all the datapoints in the given dataset. It is one of the most popular metrics out there!
- Median absolute error: This is the median of all the errors in the given dataset. The main advantage of this metric is that it's robust to outliers. A single bad point in the test dataset wouldn't skew the entire error metric, as opposed to a mean error metric.
- Explained variance score: This score measures how well our model can account for the variation in our dataset. A score of 1.0 indicates that our model is perfect.
- R2 score: This is pronounced as R-squared, and this score refers to the coefficient of determination. This tells us how well the unknown samples will be predicted by our model. The best possible score is 1.0, and the values can be negative as well.
How to do it…
There is a module in scikit-learn that provides functionalities to compute all the following metrics. Open a new Python file and add the following lines:
import sklearn.metrics as sm print "Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2) print "Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2) print "Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2) print "Explained variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2) print "R2 score =", round(sm.r2_score(y_test, y_test_pred), 2)
Keeping track of every single metric can get tedious, so we pick one or two metrics to evaluate our model. A good practice is to make sure that the mean squared error is low and the explained variance score is high.