The Regularization Cookbook

Machine Learning Refresher

Machine learning (ML) is much more than just models. It is about following a certain process and best practices. This chapter will provide a refresher on these: from loading data and model evaluation to model training and optimization, the main steps and methods will be explained here.

In this chapter, we are going to cover the following main topics:

Loading data
Splitting data
Preparing quantitative data
Preparing qualitative data
Training a model
Evaluating a model
Performing hyperparameter optimization

Even though the recipes in this chapter are independent from a methodological standpoint, they build upon each other and are meant to be executed sequentially.

Loading data

The primary focus of this recipe is to load data from a CSV file. However, this is not the only thing that this recipe covers. Since the data is usually the first step in any ML project, this recipe is also a good opportunity to give a quick recap of the ML workflow, as well as the different types of data.

Getting ready

Before loading the data, we should keep in mind that an ML model follows a two-step process:

Train a model on a given dataset to create a new model.
Reuse the previously trained model to infer predictions on new data.

These two steps are summarized in the following figure:

Figure 2.1 – A simple view of the two-step ML process

Of course, in most cases, this is a rather simplistic view. A more detailed view can be seen in Figure 2.2:

Figure 2.2 – A more complete view of the ML process

Let’s take a closer look at the training part of the ML process shown in Figure 2.2:

First, training data is queried from a data source (this can be a database, a data lake, an open dataset, and so on).
The data is preprocessed, such as via feature engineering, rescaling, and so on.
A model is trained and stored (on a data lake, locally, on the edge, and so on).
Optionally, the output of this model is post-processed – for example, via formatting, heuristics, business rules, and more.
Optionally again, this model (with or without postprocessing) is stored in a database for later reference or evaluation if needed.

Now, let’s take a look at the inference part of the ML process:

The data is queried from a data source (a database, an API query, and so on).
The data goes through the same preprocessing step as the training data.
The trained model is fetched if it doesn’t already exist locally.
The model is used to infer output.
Optionally, the output of the model is post-processed via the same post-processing step as the training data.
Optionally, the output is stored in a database for monitoring and later reference.

Even in this schema, many steps were not mentioned: splitting data for training purposes, using evaluation metrics, cross-validation, hyperparameter optimization, and others. This chapter will dive into the more training-specific steps and apply them to the very common but practical Titanic dataset, a binary classification problem. But first, we need to load the data.

To do so, you must download the Titanic dataset training set locally. This can be performed with the following command line:

wget https://github.com/PacktPublishing/The-Regularization-Cookbook/blob/main/chapter_02/train.csv

How to do it…

This recipe is about loading a CSV file and displaying a few lines of code so that we can have a first glance at what it is about:

The first step is to import the required libraries. Here, the only library we need is pandas:
```
import pandas as pd
```
Now, we can load the data using the read_csv function provided by pandas. The first argument is the path to the file. Assuming the file is named train.csv and located in the current folder, we only have to provide train.csv as an argument:
```
# Load the data as a DataFrame
```
```
df = pd.read_csv('train.csv')
```

The returned object is a dataframe object, which provides many useful methods for data processing.

Now, we can display the first five lines of the loaded file using the .head() method:
```
# Display the first 5 rows of the dataset
```
```
df.head()
```

This code will output the following:

   PassengerId  Survived  Pclass  \
0        1            0         3
1        2            1         1
2        3            1         3
3        4            1         1
4        5            0         3
      Name                      Sex   Age     SibSp  \
0   Braund, Mr. Owen Harris     male  22.0       1
1  Cumings, Mrs. John Bradley (Florence Briggs Th...
                               female  38.0        1
2  Heikkinen, Miss. Laina  female  26.0        0
3  Futrelle, Mrs. Jacques Heath (Lily May Peel)
                            female  35.0        1
4  Allen, Mr. William Henry     male  35.0        0
 Parch      Ticket   Fare   Cabin        Embarked
0  0         A/5   21171   7.2500   NaN           S
1  0       PC 17599  71.2833   C85       C
2  0      STON/O2. 3101282   7.9250   NaN       S
3  0        113803  53.1000  C123           S
4  0        373450   8.0500   NaN    S

Here is a description of the data types in each column:

PassengerId (qualitative): A unique, arbitrary ID for each passenger.
Survived (qualitative): 1 for yes, 0 for no. This is our label, so this is a binary classification problem.
Pclass (quantitative, discrete): The class, which is arguably quantitative. Is class 1 better than class 2? Most likely yes.
Name (unstructured): The name and title of the passenger.
Sex (qualitative): The registered sex of the passenger, either male or female.
Age (quantitative, discrete): The age of the passenger.
SibSp (quantitative, discrete): The number of siblings and spouses on board.
Parch (quantitative, discrete): The number of parents and children on board.
Ticket (unstructured): The ticket reference.
Fare (quantitative, continuous): The ticket price.
Cabin (unstructured): The cabin number, which is arguably unstructured. It can be seen as a qualitative feature with high cardinality.
Embarked (qualitative): The embarked city, either Southampton (S), Cherbourg (C), or Queenstown (Q).

There’s more…

Let’s talk about the different types of data that are available. Data is a very generic word and can describe many things. We are surrounded by data all the time. One way to specify data is using opposites.

Data can be structured or unstructured:

Structured data comes in the form of tables, databases, Excel files, CSV files, and JSON files.
Unstructured data does not fit in a table: it can be text, sound, image, videos, and so on. Even if we tend to have tabular representation, this kind of data does not naturally fit in an Excel table.

Data can be quantitative or qualitative.

Quantitative data is ordered. Here are some examples:

€100 is greater than €10
1.8 meters is taller than 1.6 meters
18 years old is younger than 80 years old

Qualitative data has no intrinsic order, as shown here:

Blue is not intrinsically better than red
A dog is not intrinsically greater than a cat
A kitchen is not intrinsically more useful than a bathroom

These are not mutually exclusive. An object can have both quantitative and qualitative features, as can be seen in the case of the car in the following figure:

Figure 2.3 – A single object depicted by both quantitative (left) and qualitative (right) features

Finally, data can be continuous or discrete.

Some data is continuous, as follows:

A weight
A volume
A price

On the other hand, some data is discrete:

A color
A football score
A nationality

Note

Discrete != qualitative.

For example, a football score is discrete, but there is an intrinsic order: 3 points is more than 2.

Splitting data

After loading data, splitting it is a crucial step. This recipe will explain why we need to split data, as well as how to do it.

Getting ready

Why do we need to split data? An ML model is quite like a student.

You provide a student with many lectures and exercises, with or without the answers. But more often than not, students are evaluated on a completely new problem. To make sure they fully understand the concepts and methods, they not only learn the exercises and solutions – they also understand the underlying concepts.

An ML model is no different: you train the model on training data and then evaluate it on test data. This way, you make sure the model fully understands the task and generalizes well to new, unseen data.

So, the dataset is usually split into train and test sets:

The train set must be as large as possible to give as many samples as possible to the model
The test set must be large enough to be statistically significant in evaluating the model

Typical splits can be anywhere between 80% to 20% for rather small datasets (for example, hundreds of samples), and 99% to 1% for very large datasets (for example, millions of samples and more).

For this recipe and the others in this chapter, it is assumed that the code has been executed in the same notebook as the previous recipe since each recipe reuses the code from the previous ones.

How to do it…

Here are the steps to try out this recipe:

You can split the data rather easily with scikit-learn and the train_test_split() function:

# Import the train_test_split function

from sklearn.model_selection import train_test_split

# Split the data

X_train, X_test, y_train, y_test = train_test_split(

    df.drop(columns=['Survived']), df['Survived'],

    test_size=0.2, stratify=df['Survived'],

    random_state=0)

This function uses the following parameters as input:

X: All columns but the 'Survived' label
y: The 'Survived' label column
test_size: This is 0.2, which means the training size will be 80%
stratify: This specifies the 'Survived' column to ensure the same label balance is used in both splits
random_state: 0 is any integer to ensure reproducibility

It returns the following outputs:

X_train: The train split of X
X_test: The test split of X
y_train: The training split of y, associated with X_train
y_test: The test split of y, associated with X_test

Note

The stratify option is not mandatory but can be critical to ensure a balanced split of any qualitative feature, not just the labels, as is the case with imbalanced data.

This split should be done as early as possible when performing data processing so that you avoid any potential data leakage. From now on, all the preprocessing will be computed on the train set, and only then applied to the test set, in agreement with Figure 2.2.

Preparing quantitative data

Depending on the type of data, how the features must be prepared may differ. In this recipe, we’ll cover how to prepare quantitative data, including missing data imputation and rescaling.

Getting ready

In the Titanic dataset, as well as any other dataset, there may be missing data. There are several ways to deal with missing data. For example, you can drop a column or a row, or impute a value. There are many imputation techniques, some of which are more or less sophisticated. scikit-learn supplies several implementations of imputers, such as SimpleImputer and KNNImputer.

As we will see in this recipe, using SimpleImputer, we can impute the missing quantitative data with the mean value.

Once the missing data has been handled, we can prepare the quantitative data by rescaling it so that all the data is at the same scale.

Several rescaling strategies exist, such as min-max scaling, robust scaling, standard scaling, and others.

In this recipe, we will use standard scaling. So, for each feature, we will subtract the mean value of this feature, and then divide it by the standard deviation of that feature:

Fortunately, scikit-learn provides a fully working implementation via StandardScaler.

How to do it…

We will sequentially handle missing values and rescale the data in this recipe:

Import the required classes – SimpleImputer for missing data imputation and StandardScaler for rescaling:
```
from sklearn.impute import SimpleImputer
```
```
from sklearn.preprocessing import StandardScaler
```
Select the quantitative features we want to keep. Here, we will keep 'Pclass', 'Age', 'Fare', 'SibSp', and 'Parch' and store these features in new variables for both the train and test sets:
```
quanti_columns = ['Pclass', 'Age', 'Fare', 'SibSp', 'Parch']
```
```
# Get the quantitative columns
```
```
X_train_quanti = X_train[quanti_columns]
```
```
X_test_quanti = X_test[quanti_columns]
```
Instantiate the simple imputer with a mean strategy. Here, the missing value of a feature will be replaced with the mean value of that feature:
```
# Impute missing quantitative values with mean feature value
```
```
quanti_imputer = SimpleImputer(strategy='mean')
```

Fit the imputer on the train set and apply it to the test set so that it avoids leakage in the imputation:

# Fit and impute the training set

X_train_quanti = quanti_imputer.fit_transform(X_train_quanti)

# Just impute the test set

X_test_quanti = quanti_imputer.transform(X_test_quanti)

Now that imputation has been performed, instantiate the scaler object:
```
# Instantiate the standard scaler
```
```
scaler = StandardScaler()
```

Finally, fit and apply the standard scaler to the train set, and then apply it to the test set:

# Fit and transform the training set

X_train_quanti = scaler.fit_transform(X_train_quanti)

# Just transform the test set

X_test_quanti = scaler.transform(X_test_quanti)

We now have quantitative data with no missing values, fully rescaled, with no data leakage.

There’s more…

In this recipe, we used the simple imputer, assuming there was missing data. In practice, it is highly recommended that you look at the data first to check whether there are missing values, as well as how many. It is possible to look at the number of missing values per column with the following code snippet:

# Display the number of missing data for each column
X_train[quanti_columns].isna().sum()

This will output the following:

Pclass        0
Age         146
Fare           0
SibSp         0
Parch         0

Thanks to this, we know that the Age feature has 146 missing values, while the other features have no missing data.

Preparing qualitative data

In this recipe, we will prepare qualitative data, including missing value imputation and encoding.

Getting ready

Qualitative data requires different treatment from quantitative data. Imputing missing values with the mean value of a feature would make no sense (and would not work with non-numeric data): it makes more sense, for example, to use the most frequent value or the mode of a feature. The SimpleImputer class allows us to do such things.

The same goes for rescaling: it would make no sense to rescale qualitative data. Instead, it is more common to encode it. One of the most typical techniques is called one-hot encoding.

The idea is to transform each of the categories, over a total possible N categories, in a vector holding a 1 and N-1 zeros. In our example, the Embarked feature’s one-hot encoding would be as follows:

‘C’ = [1, 0, 0]
‘Q’ = [0, 1, 0]
‘S’ = [0, 0, 1]

Note

Having N columns for N categories is not necessarily optimal. What happens if, in the preceding example, we remove the first column? If the value is not ‘Q’ = [1, 0] nor ‘S’ = [0, 1], then it must be ‘C’ = [0, 0]. There is no need to add one more column to have all the necessary information. This can be generalized to N categories only requiring N-1 columns to have all the information, which is why one-hot encoding functions usually allow you to drop a column.

The sklearn class’ OneHotEncoder allows us to do this. It also allows us to deal with unknown categories that may appear in the test set (or the production environment) with several strategies, such as an error, ignore, or infrequent class. Finally, it allows us to drop the first column after encoding.

How to do it…

Just like in the preceding recipe, we will handle any missing data and the features will be one-hot encoded:

Import the necessary classes – SimpleImputer for missing data imputation (already imported in the previous recipe) and OneHotEncoder for encoding. We also need to import numpy so that we can concatenate the qualitative and quantitative data that’s been prepared at the end of this recipe:
```
import numpy as np
```
```
from sklearn.impute import SimpleImputer
```
```
from sklearn.preprocessing import OneHotEncoder
```
Select the qualitative features we want to keep: 'Sex' and 'Embarked'. Then, store these features in new variables for both the train and test sets:
```
quali_columns = ['Sex', 'Embarked']
```
```
# Get the quantitative columns
```
```
X_train_quali = X_train[quali_columns]
```
```
X_test_quali = X_test[quali_columns]
```

Instantiate SimpleImputer with most_frequent strategy. Any missing values will be replaced by the most frequent ones:

# Impute missing qualitative values with most frequent feature value

quali_imputer =SimpleImputer(strategy='most_frequent')

Fit and transform the imputer on the train set, and then transform the test set:

# Fit and impute the training set

X_train_quali = quali_imputer.fit_transform(X_train_quali)

# Just impute the test set

X_test_quali = quali_imputer.transform(X_test_quali)

Instantiate the encoder. Here, we will specify the following parameters:
- drop='first': This will drop the first columns of the encoding
- handle_unknown='ignore': If a new value appears in the test set (or in production), it will be encoded as zeros:
```
# Instantiate the encoder
```
```
encoder=OneHotEncoder(drop='first', handle_unknown='ignore')
```

Fit and transform the encoder on the training set, and then transform the test set using this encoder:

# Fit and transform the training set

X_train_quali = encoder.fit_transform(X_train_quali).toarray()

# Just encode the test set

X_test_quali = encoder.transform(X_test_quali).toarray()

Note

We need to use .toarray() out of the encoder because the array is a sparse matrix object by default and cannot be concatenated in that form with the other features.

With that, all the data has been prepared – both quantitative and qualitative (considering this recipe and the previous one). It is now possible to concatenate this data before training a model:
```
# Concatenate the data back together
```
```
X_train = np.concatenate([X_train_quanti,
```
```
    X_train_quali], axis=1)
```
```
X_test = np.concatenate([X_test_quanti, X_test_quali], axis=1)
```

There’s more…

It is possible to save the data as a pickle file, either to share it or save it and avoid having to prepare it again. The following code will allow us to do this:

import pickle
pickle.dump((X_train, X_test, y_train, y_test),
    open('prepared_titanic.pkl', 'wb'))

We now have fully prepared data that can be used to train ML models.

Note

Several steps have been omitted or simplified here for more clarity. Data may need more preparation, such as more thorough missing value imputation, outlier and duplicate detection (and perhaps removal), feature engineering, and so on. It is assumed that you already have some sense of those aspects and are encouraged to read other materials about this topic if required.

Training a model

Once data has been fully cleaned and prepared, it is fairly easy to train a model thanks to scikit-learn. In this recipe, before training a logistic regression model on the Titanic dataset, we will quickly recap the ML paradigm and the different types of ML we can use.

Getting ready

If you were asked how to differentiate a car from a truck, you may be tempted to provide a list of rules, such as the number of wheels, size, weight, and so on. By doing so, you would be able to provide a set of explicit rules that would allow anyone to identify a car and a truck as different types of vehicles.

Traditional programming is not so different. While developing algorithms, programmers often build explicit rules, which allow them to map from data input (for example, a vehicle) to answers (for example, a car). We can summarize this paradigm as data + rules = answers.

If we were to train an ML model to discriminate cars from trucks, we would use another strategy: we would feed an ML algorithm with many pieces of data and their associated answers, expecting the model to learn to correct rules by itself. This is a different approach that can be summarized as data + answers = rules. This paradigm difference is summarized in Figure 2.4. As little as it might look to ML practitioners, it changes everything in terms of regularization:

Figure 2.4 – Comparing traditional programming with ML algorithms

Regularizing traditional algorithms is conceptually straightforward. For example, what if the rules for defining a truck overlap with the bus definition? If so, we can add the fact that buses have lots of windows.

Regularization in ML is intrinsically implicit. What if the model in this case does not discriminate between buses and trucks?

Should we add more data?
Is the model complex enough to capture such a difference?
Is it underfitting or overfitting?

This fundamental property of ML makes regularization complex.

ML can be applied to many tasks. Anyone who uses ML knows there is not just one type of ML model.

Arguably, most ML models fall into three main categories:

Supervised learning
Unsupervised learning
Reinforcement learning

As is usually the case for categories, the landscape is more complex, with sub-categories and methods overlapping several categories. But this is beyond the scope of this book.

This book will focus on regularization for supervised learning. In supervised learning, the problem is usually quite easy to specify: we have input features, X (for example, apartment surface), and labels, y (for example, apartment price). The goal is to train a model so that it’s robust enough to predict y, given X.

The two major types of ML are classification and regression:

Classification: The labels are made of qualitative data. For example, the task is predicting between two or more classes such as car, bus, and truck.
Regression: The labels are made of quantitative data. For example, the task is predicting an actual value, such as an apartment price.

Again, the line can be blurry; some tasks can be solved with classification while the labels are quantitative data, while others tasks can be both classification and regression ones. See Figure 2.5:

Figure 2.5 – Regularization versus classification

How to do it…

Assuming we want to train a logistic regression model (which will be explained properly in the next chapter), the scikit-learn library provides the LogisticRegression class, along with the fit() and predict() methods. Let’s learn how to use it:

Import the LogisticRegression class:

from sklearn.linear_model import LogisticRegression

Instantiate a LogisticRegression object:

# Instantiate the model

lr = LogisticRegression()

Fit the model on the train set:

# Fit on the training data

lr.fit(X_train, y_train)

Optionally, compute predictions by using that model on the test set:

# Compute and store predictions on the test data

y_pred = lr.predict(X_test)

Evaluating a model

Once the model has been trained, it is important to evaluate it. In this recipe, we will provide a few insights about a few typical metrics for both classification and regression, before evaluating our model on the test set.

Getting ready

Many evaluation metrics exist. If we think about predicting a binary classification and take a step back, there are only four cases:

False positive (FP): Positive prediction, negative ground truth
True positive (TP): Positive prediction, positive ground truth
True negative (TN): Negative prediction, negative ground truth
False negative (FN): Negative prediction, positive ground truth:

Figure 2.6 – Representation of false positive, true positive, true negative, and false negative

Based on this, we can define a wide range of evaluation metrics.

One of the most common metrics is accuracy, which is the ratio of good predictions. The definition of accuracy is as follows:

Note

Although very common, the accuracy may be misleading, especially for imbalanced labels. For example, let’s assume an extreme case where 99% of Titanic passengers survived, and we have a model that predicts that every passenger survived. Our model would have a 99% accuracy but would be wrong for 100% of passengers who did not survive.

There are several other very common metrics, such as precision, recall, and the F1 score.

Precision is most suited when you’re trying to maximize the true positives and minimize the false positives – for example, making sure you detect only surviving passengers:

Recall is most suited when you’re trying to maximize the true positives and minimize the false negatives – for example, making sure you don’t miss any surviving passengers:

The F1 score is just a combination of the precision and recall metrics as a harmonic mean:

Another useful classification evaluation metric is the Receiver Operating Characteristic Area Under Curve (ROC AUC) score.

All these metrics behave similarly: when there are values between 0 and 1, the higher the value, the better the model. Some are also more robust to imbalanced labels, especially the F1 score and ROC AUC.

For regression tasks, the most used metrics are the mean squared error (MSE) and the R2 score.

The MSE is the averaged square difference between the predictions and the ground truth:

Here, m is the number of samples, ŷ is the predictions, and y is the ground truth:

Figure 2.7 – Visualization of the errors for a regression task

In terms of the R2 score, it is a metric that can be negative and is defined as follows:

Note

While the R2 score is a typical evaluation metric (the closer to 1, the better), the MSE is more typical of a loss function (the closer to 0, the better).

How to do it…

Assuming our chosen evaluation metric here is accuracy, a very simple way to evaluate our model is to use the accuracy_score() function:

from sklearn.metrics import accuracy_score
# Compute the accuracy on test of our model
print('accuracy on test set:', accuracy_score(y_pred,
    y_test))

This outputs the following:

accuracy on test set: 0.7877094972067039

Here, the accuracy_score() function provides an accuracy of 78.77%, meaning about 79% of our model’s predictions are right.

Performing hyperparameter optimization

In this recipe, we will explain what hyperparameter optimization is and some related concepts: the definition of a hyperparameter, cross-validation, and various hyperparameter optimization methods. We will then perform a grid search to optimize the hyperparameters of the logistic regression task on the Titanic dataset.

Getting ready

Most of the time, in ML, we do not simply train a model on the training set and evaluate it against the test set.

This is because, like most other algorithms, ML algorithms can be fine-tuned. This fine-tuning process allows us to optimize hyperparameters to achieve the best possible results. This sometimes acts as leverage so that we can regularize a model.

Note

In ML, hyperparameters can be tuned by humans, unlike parameters, which are learned through the model training process, and thus can’t be tuned.

To properly optimize hyperparameters, a third split has to be introduced: the validation set.

This means there are now three splits:

The training set: Where the model is trained
The validation set: Where the hyperparameters are optimized
The test set: Where the model is evaluated

You could create such a set by splitting X_train into X_train and X_valid with the train_test_split() function from scikit-learn.

But in practice, most people just use cross-validation and do not bother creating this validation set. The k-fold cross-validation method allows us to make k splits out of the training set and divide it, as presented in Figure 2.8:

Figure 2.8 – Typical split between training, validation, and test sets, without cross-validation (top) and with cross-validation (bottom)

In doing so, not just one model is trained, but k, for a given set of hyperparameters. The performances are averaged over those k models, based on a chosen metric (for example, accuracy, MSE, and so on).

Several sets of hyperparameters can then be tested, and the one that shows the best performance is selected. After selecting the best hyperparameter set, the model is trained one more time on the entire train set to maximize the data for training purposes.

Finally, you can implement several strategies to optimize the hyperparameters, as follows:

Grid search: Test all combinations of the provided values of hyperparameters
Random search: Randomly search combinations of hyperparameters
Bayesian search: Perform Bayesian optimization on the hyperparameters

How to do it…

While being rather complicated to explain conceptually, hyperparameter optimization with cross-validation is super easy to implement. In this recipe, we’ll assume that we want to optimize a logistic regression model to predict whether a passenger would have survived:

First, we need to import the GridSearchCV class from sklearn.model_selection.
We would like to test the following hyperparameter values for C: [0.01, 0.03, 0.1]. We must define a parameter grid with the hyperparameter as the key and the list of values to test as the value.

The C hyperparameter is the inverse of the penalization strength: the higher C is, the lower the regularization. See the next chapter for more details:

# Define the hyperparameters we want to test
param_grid = { 'C': [0.01, 0.03, 0.1] }

Finally, let’s assume we want to optimize our model on accuracy, with five cross-validation folds. To do this, we will instantiate the GridSearchCV object and provide the following arguments:
- The model to optimize, which is a LogisticRegression instance
- The parameter grid, param_grid, which we defined previously
- The scoring on which to optimize – that is, accuracy
- The number of cross-validation folds, which has been set to 5 here

We must also set return_train_score to True to get some useful information we can use later:

# Instantiate the grid search object

grid = GridSearchCV(

    LogisticRegression(),

    param_grid,

    scoring='accuracy',

    cv=5,

    return_train_score=True

Finally, all we have to do is train this object on the train set. This will automatically make all the computations and store the results:

# Fit and wait

grid.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=LogisticRegression(),

    param_grid={'C': [0.01, 0.03, 0.1]},

    return_train_score=True, scoring='accuracy')

Note

Depending on the input dataset and the number of tested hyperparameters, the fit may take some time.

Once the fit has been completed, you can get a lot of useful information, such as the following:

The hyperparameter set via the .best_params attribute
The best accuracy score via the .best_score attribute
The cross-validation results via the .cv_results attribute

Finally, you can infer the model that was trained with optimized hyperparameters using the .predict() method:
```
y_pred = grid.predict(X_test)
```

Optionally, you can evaluate the chosen model with the accuracy score:

print('Hyperparameter optimized accuracy:',

    accuracy_score(y_pred, y_test))

This provides the following output:

Hyperparameter optimized accuracy: 0.781229050279329

Thanks to the tools provided by scikit-learn, it is fairly easy to have a well-optimized model and evaluate it against several metrics. In the next recipe, we’ll learn how to diagnose bias and variance based on such an evaluation.

Filter reviews by

All

Amazon verified reviews

S.Kundu Sep 19, 2023

The Regularization Cookbook will been an awesome read if you want to explore different options to improve the functionality of your ML models.I am sharing my views regarding the same.The book will explain you what Regularization concept actually is along with giving you basic refresher of Machine Learning and Deep Learning so that it become easy for you to follow rest of the book.The book will teach you the different concepts of Regularization such as:Regularization with Linear models such as Ridge Regression, Lasso Regression and Logistic Regression.Regularization with Tree based models including Regularization of Decision Tree, Random Forest and XGBoost.Regularization with L2 Regularization, network architecture and dropout.Regularization with Recurrent Neural Networks with dropout and maximum length sequence. It will also discuss about training an RNN and GRU.Advanced Regularization in Natural Language Processing using word2vec, BERT and will discuss on data augmentation using word2vec and GPT-3.Regularization in Computer Vision along with synthetic image generation. It will discuss on Regularizing a CNN with vanilla NN methods and transfer learning. It will also cover topics on Spatial and pixel level augmentation.

Amazon Verified review

Yiqiao Yin Sep 07, 2023

In "The Regularization Cookbook," readers are provided an in-depth exploration into the world of regularization, offering a detailed roadmap from foundational concepts to advanced methodologies. The introduction establishes a firm grounding on the very essence of regularization, setting the stage for the chapters that follow.Upon establishing the basics, the book delves into practical applications, addressing the complexities of regularizing linear models such as logistic regression. It further extends its ambit to elucidate the intricacies involved in tree-based models, particularly the increasingly popular XGBoost. Such detailed treatment is both commendable and crucial for readers at various stages of their data science journey.A standout feature of this work is its comprehensive coverage of deep learning. With the burgeoning significance of Natural Language Processing (NLP) and computer vision in contemporary machine learning, the in-depth treatment of regularization methods tailored for Recurrent Neural Networks, transformers, and seminal models like BERT and GPT-3 is of paramount importance. These chapters promise to be both enlightening and essential for professionals aiming to harness the full power of these technologies.Furthermore, the segments on computer vision offer an expansive overview. Not only do they unravel the layers of Convolutional Neural Networks, but they also venture into the compelling domain of generative models, showcasing models like Dall-E.In summation, "The Regularization Cookbook" is a masterful compilation, adeptly spanning the breadth and depth of regularization in machine learning and deep learning. Whether a novice seeking foundational insights or an expert aiming for nuanced understanding, this book is poised to be an invaluable addition to one's scholarly repertoire.

Ratan Aug 28, 2023

This books presents a fair and an organized way of regularization techniques for a lot of algorithms. This book is more on applied side rather than having mathematical rigorous proofs. It gives enough mathematical background to understand algorithms and how regularization can be applied on them. One good thing is - author connects how theoretical parameters are actually represented in scikit learn framework which is helpful if you aren’t a pro with Sklearn.I particularly liked the details about how regularization can be applied to language models which is interesting. If you are not looking for mathematical proofs and are on more applied ML side, this book is a good read overall. Author has presented topics in a very organized manner with good hands-on snippets.

Om S Aug 13, 2023

Starting with the basics, the book unveils the importance of regularization and offers expert insights into diagnosing overfitting. From there, it delves into an array of techniques applicable to various machine learning models, including linear and tree-based models. The authors showcase how to apply these techniques effectively, ensuring a solid grasp of the concepts.A standout feature of the book is its dedication to real-world scenarios. It addresses challenges associated with high cardinality features and imbalanced datasets, offering tailored regularization methods. The deep learning sections are equally compelling, guiding readers through strategies for both general neural networks and NLP-specific applications like transformers, BERT, and GPT-3.What sets "The Regularization Cookbook" apart is its accessibility. While catering to experienced practitioners, it also accommodates those new to the field. The book provides Python codes and revisits fundamental concepts, ensuring a smooth learning curve for all readers.In a landscape where model optimization is paramount, "The Regularization Cookbook" emerges as an essential tool. It empowers readers to elevate their understanding of regularization, ultimately enhancing the performance, robustness, and reliability of their machine learning and deep learning models. Whether you're an enthusiast, data scientist, or machine learning professional, this book is your guide to becoming a regularization virtuoso.

Amazon Customer Aug 24, 2023

"The Regularization Cookbook" is an invaluable resource for understanding regularization techniques in machine learning. It strikes a balance between theory and practical implementation, making it accessible to beginners and experienced practitioners alike. The book covers a wide range of regularization methods with clear explanations and code examples. It also presents the pros and cons of each technique, helping readers make informed decisions. While some advanced topics may not be explored in depth, the book still provides a solid foundation for further exploration. Overall, it is an indispensable resource for anyone interested in regularization techniques in machine learning.

The Regularization Cookbook: Explore practical recipes to improve the functionality of your ML models

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs