You're reading from The Regularization Cookbook Explore practical recipes to improve the functionality of your ML models

Product type Paperback

Published in Jul 2023

Publisher Packt

ISBN-13 9781837634088

Length 424 pages

Edition 1st Edition

Languages

Ring

Tools

Astro

Concepts

Machine Learning

Author (1):

Vincent Vandenbussche

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: An Overview of Regularization

2. Chapter 2: Machine Learning Refresher FREE CHAPTER

3. Chapter 3: Regularization with Linear Models

4. Chapter 4: Regularization with Tree-Based Models

5. Chapter 5: Regularization with Data

6. Chapter 6: Deep Learning Reminders

7. Chapter 7: Deep Learning Regularization

8. Chapter 8: Regularization with Recurrent Neural Networks

9. Chapter 9: Advanced Regularization in Natural Language Processing

10. Chapter 10: Regularization in Computer Vision

11. Chapter 11: Regularization in Computer Vision – Synthetic Image Generation

12. Index

Why subscribe?

13. Other Books You May Enjoy

Performing hyperparameter optimization

In this recipe, we will explain what hyperparameter optimization is and some related concepts: the definition of a hyperparameter, cross-validation, and various hyperparameter optimization methods. We will then perform a grid search to optimize the hyperparameters of the logistic regression task on the Titanic dataset.

Getting ready

Most of the time, in ML, we do not simply train a model on the training set and evaluate it against the test set.

This is because, like most other algorithms, ML algorithms can be fine-tuned. This fine-tuning process allows us to optimize hyperparameters to achieve the best possible results. This sometimes acts as leverage so that we can regularize a model.

Note

In ML, hyperparameters can be tuned by humans, unlike parameters, which are learned through the model training process, and thus can’t be tuned.

To properly optimize hyperparameters, a third split has to be introduced: the validation set.

This means there are now three splits:

The training set: Where the model is trained
The validation set: Where the hyperparameters are optimized
The test set: Where the model is evaluated

You could create such a set by splitting X_train into X_train and X_valid with the train_test_split() function from scikit-learn.

But in practice, most people just use cross-validation and do not bother creating this validation set. The k-fold cross-validation method allows us to make k splits out of the training set and divide it, as presented in Figure 2.8:

Figure 2.8 – Typical split between training, validation, and test sets, without cross-validation (top) and with cross-validation (bottom)

In doing so, not just one model is trained, but k, for a given set of hyperparameters. The performances are averaged over those k models, based on a chosen metric (for example, accuracy, MSE, and so on).

Several sets of hyperparameters can then be tested, and the one that shows the best performance is selected. After selecting the best hyperparameter set, the model is trained one more time on the entire train set to maximize the data for training purposes.

Finally, you can implement several strategies to optimize the hyperparameters, as follows:

Grid search: Test all combinations of the provided values of hyperparameters
Random search: Randomly search combinations of hyperparameters
Bayesian search: Perform Bayesian optimization on the hyperparameters

How to do it…

While being rather complicated to explain conceptually, hyperparameter optimization with cross-validation is super easy to implement. In this recipe, we’ll assume that we want to optimize a logistic regression model to predict whether a passenger would have survived:

First, we need to import the GridSearchCV class from sklearn.model_selection.
We would like to test the following hyperparameter values for C: [0.01, 0.03, 0.1]. We must define a parameter grid with the hyperparameter as the key and the list of values to test as the value.

The C hyperparameter is the inverse of the penalization strength: the higher C is, the lower the regularization. See the next chapter for more details:

# Define the hyperparameters we want to test
param_grid = { 'C': [0.01, 0.03, 0.1] }

Finally, let’s assume we want to optimize our model on accuracy, with five cross-validation folds. To do this, we will instantiate the GridSearchCV object and provide the following arguments:
- The model to optimize, which is a LogisticRegression instance
- The parameter grid, param_grid, which we defined previously
- The scoring on which to optimize – that is, accuracy
- The number of cross-validation folds, which has been set to 5 here

We must also set return_train_score to True to get some useful information we can use later:

# Instantiate the grid search object

grid = GridSearchCV(

    LogisticRegression(),

    param_grid,

    scoring='accuracy',

    cv=5,

    return_train_score=True

Finally, all we have to do is train this object on the train set. This will automatically make all the computations and store the results:

# Fit and wait

grid.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=LogisticRegression(),

    param_grid={'C': [0.01, 0.03, 0.1]},

    return_train_score=True, scoring='accuracy')

Note

Depending on the input dataset and the number of tested hyperparameters, the fit may take some time.

Once the fit has been completed, you can get a lot of useful information, such as the following:

The hyperparameter set via the .best_params attribute
The best accuracy score via the .best_score attribute
The cross-validation results via the .cv_results attribute

Finally, you can infer the model that was trained with optimized hyperparameters using the .predict() method:
```
y_pred = grid.predict(X_test)
```

Optionally, you can evaluate the chosen model with the accuracy score:

print('Hyperparameter optimized accuracy:',

    accuracy_score(y_pred, y_test))

This provides the following output:

Hyperparameter optimized accuracy: 0.781229050279329

Thanks to the tools provided by scikit-learn, it is fairly easy to have a well-optimized model and evaluate it against several metrics. In the next recipe, we’ll learn how to diagnose bias and variance based on such an evaluation.