Model Tuning
In this section, we will delve further into evaluating model performance and examine techniques that we can use to generalize models to new data using regularization
. Providing the context of a model's performance is extremely important. Our aim is to determine whether our model is performing well compared to trivial or obvious approaches. We do this by creating a baseline model against which machine learning models we train are compared. It is important to stress that all model evaluation metrics are evaluated and reported via the test
dataset since that will give us an understanding of how the model will perform on new data.
Baseline Models
A baseline model should be a simple and well-understood procedure, and the performance of this model should be the lowest acceptable performance for any model we build. For classification models, a useful and easy baseline model is to calculate the model outcome value. For example, if there are 60%
false
values, our baseline model would be to predict false for every value, which would give us an accuracy
of 60%
. For regression models
, the mean
or median
can be used as the baseline.
Exercise 1.05: Determining a Baseline Model
In this exercise, we will put the model performance into context. The accuracy we attained from our model seemed good, but we need something to compare it to. Since machine learning model performance is relative, it is important to develop a robust baseline with which to compare models. Once again, we are using the online shoppers purchasing intention dataset, and our target
variable is whether or not each user will purchase a product in their session. Follow these steps to complete this exercise:
- Import the
pandas
library and load in thetarget
dataset:import pandas as pd target = pd.read_csv('../data/OSI_target_e2.csv')
- Next, calculate the relative proportion of each value of the
target
variables:target['Revenue'].value_counts()/target.shape[0]*100
The following figure shows the output of the preceding code:
- Here, we can see that
0
is represented84.525547%
of the time—that is, there is no purchase by the user, and this is ourbaseline
accuracy. Now, for the other model evaluation metrics:from sklearn import metrics y_baseline = pd.Series(data=[0]*target.shape[0]) precision, recall, \ fscore, _ = metrics.precision_recall_fscore_support\ (y_pred=y_baseline, \ y_true=target['Revenue'], average='macro')
Here, we've set the baseline model to predict
0
and have repeated the value so that it's the same as the number of rows in thetest
dataset.Note
The average parameter in the
precision_recall_fscore_support
function has to be set tomacro
because when it is set tobinary
, as it was previously, the function is looking fortrue
values, and ourbaseline
model only consists offalse
values. - Print the final output for precision, recall, and fscore:
print(f'Precision: {precision:.4f}\nRecall:\ {recall:.4f}\nfscore: {fscore:.4f}')
The preceding code produces the following output:
Precision: 0.9226 Recall: 0.5000 Fscore: 0.4581
Now, we have a baseline model that we can compare to our previous model, as well as any subsequent models. By doing this, we can tell that while the accuracy of our previous model seemed high, it did not score much better than this baseline
model.
Note
To access the source code for this specific section, please refer to https://packt.live/31MD1jH.
You can also run this example online at https://packt.live/2VFFSXO.
Regularization
Earlier in this chapter, we learned about overfitting
and what it looks like. The hallmark of overfitting
is when a model is trained on the training data and performs extremely well yet performs terribly on test
data. One reason for this could be that the model may be relying too heavily on certain features that lead to good performance in the training dataset but do not generalize well to new observations of data or the test dataset.
One technique that can be used to avoid this is called regularization
. Regularization constrains the values of the coefficients toward zero, which discourages a complex model. There are many different types of regularization techniques. For example, in linear
and logistic
regression, ridge
and lasso
regularization are most common. In tree-based models, limiting the maximum depth of the trees acts as regularization.
There are two different types of regularization, namely L1
and L2
. This term is either the L2
norm (the sum of the squared values) of the weights or the L1
norm (the sum of the absolute values) of the weights. Since the l1
regularization parameter acts as a feature selector, it is able to reduce the coefficient of features to zero. We can use the output of this model to observe which features do not contribute much to the performance and remove them entirely if desired. The l2
regularization parameter will not reduce the coefficient of features to zero, so we will observe that they all have non-zero values.
The following code shows how to instantiate the models using these regularization techniques:
model_l1 = LogisticRegressionCV(Cs=Cs, penalty='l1', \ cv=10, solver='liblinear', \ random_state=42) model_l2 = LogisticRegressionCV(Cs=Cs, penalty='l2', \ cv=10, random_state=42)
The following code shows how to fit the models:
model_l1.fit(X_train, y_train['Revenue']) model_l2.fit(X_train, y_train['Revenue'])
The same concepts in lasso and ridge regularization can be applied to ANNs. However, penalization occurs on the weight matrices rather than the coefficients. Dropout is another form of regularization that's used to prevent overfitting in ANNs. Dropout randomly selects nodes at each iteration and removes them, along with their connections, as shown in the following figure:
Cross-Validation
Cross-validation is often used in conjunction with regularization to help tune hyperparameters. Take, for example, the penalization
parameter in ridge and lasso regression, or the proportion of nodes to drop out at each iteration using the dropout technique with ANNs. How will you determine which parameter to use? One way is to run models for each value of the regularization parameter and evaluate them on the test set; however, using the test set often can introduce bias into the model.
One popular example of cross-validation is called k-fold cross-validation. This technique gives us the ability to test our model on unseen data while retaining a test set that we will use to test at the end. Using this method, the data is divided into k
subsets. In each of the k
iterations, k-1
of the subsets are used as training data and the remaining subset is used as a validation set. This is repeated k
times until all k subsets have been used as validation sets.
By using this technique, there is a significant reduction in bias, since most of the data is used for fitting. There is also a reduction in variation since most of the data is also used for validation. Typically, there are between 5
and 10
folds, and the technique can even be stratified, which is useful when there is a large imbalance of classes.
The following example shows 5-fold cross-validation
with 20%
of the data being held out as a test set. The remaining 80%
is separated into 5 folds. Four of those folds comprise the training data, and the remaining fold is the validation data. This is repeated a total of five times until every fold has been used once for validation:
Activity 1.01: Adding Regularization to the Model
In this activity, we will utilize the same logistic regression model from the scikit-learn package. This time, however, we will add regularization to the model and search for the optimum regularization parameter—a process often called hyperparameter tuning. After training the models, we will test the predictions and compare the model evaluation metrics to those produced by the baseline model and the model without regularization.
The steps we will take are as follows:
- Load in the feature and target datasets of the online shoppers purchasing intention dataset from
'../data/OSI_feats_e3.csv'
and'../data/OSI_target_e2.csv'
. - Create
training
andtest
datasets for each of thefeature
andtarget
datasets. Thetraining
datasets will be used to train on, and the models will be evaluated using thetest
datasets. - Instantiate a model instance of the
LogisticRegressionCV
class of scikit-learn'slinear_model
package. - Fit the model to the
training
data. - Make predictions on the
test
dataset using the trained model. - Evaluate the models by comparing how they scored against the
true
values using the evaluation metrics.
After implementing these steps, you should get the following expected output:
l1 Precision: 0.7300 Recall: 0.4078 fscore: 0.5233 l2 Precision: 0.7350 Recall: 0.4106 fscore: 0.5269
Note
The solution for this activity can be found via this link.
This activity has taught us how to use regularization
in conjunction
with cross-validation
to appropriately score a model. We have learned how to fit a model to data using regularization and cross-validation. Regularization is an important technique to use to ensure that models don't overfit the training data. Models that have been trained with regularization will perform better on new data, which is generally the goal of machine learning models—to predict a target when given new observations of the input data. Choosing the optimal regularization parameter may require iterating over a number of different choices.
Cross-validation
is a technique that's used to determine which set of regularization parameters fit the data best. Cross-validation will train multiple models with different values for the regularization parameters on different cuts of the data. This technique ensures the best set of regularization parameters are chosen, without adding bias and minimizing variance.