Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Regularization Cookbook

You're reading from   The Regularization Cookbook Explore practical recipes to improve the functionality of your ML models

Arrow left icon
Product type Paperback
Published in Jul 2023
Publisher Packt
ISBN-13 9781837634088
Length 424 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Vincent Vandenbussche Vincent Vandenbussche
Author Profile Icon Vincent Vandenbussche
Vincent Vandenbussche
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: An Overview of Regularization 2. Chapter 2: Machine Learning Refresher FREE CHAPTER 3. Chapter 3: Regularization with Linear Models 4. Chapter 4: Regularization with Tree-Based Models 5. Chapter 5: Regularization with Data 6. Chapter 6: Deep Learning Reminders 7. Chapter 7: Deep Learning Regularization 8. Chapter 8: Regularization with Recurrent Neural Networks 9. Chapter 9: Advanced Regularization in Natural Language Processing 10. Chapter 10: Regularization in Computer Vision 11. Chapter 11: Regularization in Computer Vision – Synthetic Image Generation 12. Index 13. Other Books You May Enjoy

Evaluating a model

Once the model has been trained, it is important to evaluate it. In this recipe, we will provide a few insights about a few typical metrics for both classification and regression, before evaluating our model on the test set.

Getting ready

Many evaluation metrics exist. If we think about predicting a binary classification and take a step back, there are only four cases:

  • False positive (FP): Positive prediction, negative ground truth
  • True positive (TP): Positive prediction, positive ground truth
  • True negative (TN): Negative prediction, negative ground truth
  • False negative (FN): Negative prediction, positive ground truth:
Figure 2.6 – Representation of false positive, true positive, true negative, and false negative

Figure 2.6 – Representation of false positive, true positive, true negative, and false negative

Based on this, we can define a wide range of evaluation metrics.

One of the most common metrics is accuracy, which is the ratio of good predictions. The definition of accuracy is as follows:

Note

Although very common, the accuracy may be misleading, especially for imbalanced labels. For example, let’s assume an extreme case where 99% of Titanic passengers survived, and we have a model that predicts that every passenger survived. Our model would have a 99% accuracy but would be wrong for 100% of passengers who did not survive.

There are several other very common metrics, such as precision, recall, and the F1 score.

Precision is most suited when you’re trying to maximize the true positives and minimize the false positives – for example, making sure you detect only surviving passengers:

Recall is most suited when you’re trying to maximize the true positives and minimize the false negatives – for example, making sure you don’t miss any surviving passengers:

The F1 score is just a combination of the precision and recall metrics as a harmonic mean:

Another useful classification evaluation metric is the Receiver Operating Characteristic Area Under Curve (ROC AUC) score.

All these metrics behave similarly: when there are values between 0 and 1, the higher the value, the better the model. Some are also more robust to imbalanced labels, especially the F1 score and ROC AUC.

For regression tasks, the most used metrics are the mean squared error (MSE) and the R2 score.

The MSE is the averaged square difference between the predictions and the ground truth:

Here, m is the number of samples, ŷ is the predictions, and y is the ground truth:

Figure 2.7 – Visualization of the errors for a regression task

Figure 2.7 – Visualization of the errors for a regression task

In terms of the R2 score, it is a metric that can be negative and is defined as follows:

Note

While the R2 score is a typical evaluation metric (the closer to 1, the better), the MSE is more typical of a loss function (the closer to 0, the better).

How to do it…

Assuming our chosen evaluation metric here is accuracy, a very simple way to evaluate our model is to use the accuracy_score() function:

from sklearn.metrics import accuracy_score
# Compute the accuracy on test of our model
print('accuracy on test set:', accuracy_score(y_pred,
    y_test))

This outputs the following:

accuracy on test set: 0.7877094972067039

Here, the accuracy_score() function provides an accuracy of 78.77%, meaning about 79% of our model’s predictions are right.

See also

Here is a list of the available metrics in scikit-learn: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.

You have been reading a chapter from
The Regularization Cookbook
Published in: Jul 2023
Publisher: Packt
ISBN-13: 9781837634088
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image