You're reading from Machine Learning for Cybersecurity Cookbook Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Product type Paperback

Published in Nov 2019

Publisher Packt

ISBN-13 9781789614671

Length 346 pages

Edition 1st Edition

Languages

Python

Tools

Metasploit

Concepts

Cybersecurity

Author (1):

Emmanuel Tsukerman

View More author details

Hyperparameter tuning with scikit-optimize

In machine learning, a hyperparameter is a parameter whose value is set before the training process begins. For example, the choice of learning rate of a gradient boosting model and the size of the hidden layer of a multilayer perceptron, are both examples of hyperparameters. By contrast, the values of other parameters are derived via training. Hyperparameter selection is important because it can have a huge effect on the model's performance.

The most basic approach to hyperparameter tuning is called a grid search. In this method, you specify a range of potential values for each hyperparameter, and then try them all out, until you find the best combination. This brute-force approach is comprehensive but computationally intensive. More sophisticated methods exist. In this recipe, you will learn how to use Bayesian optimization over hyperparameters using scikit-optimize. In contrast to a basic grid search, in Bayesian optimization, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from specified distributions. More details can be found at https://scikit-optimize.github.io/notebooks/bayesian-optimization.html.

Getting ready

The preparation for this recipe consists of installing a specific version of scikit-learn, installing xgboost, and installing scikit-optimize in pip. The command for this is as follows:

pip install scikit-learn==0.20.3 xgboost scikit-optimize pandas

How to do it...

In the following steps, you will load the standard wine dataset and use Bayesian optimization to tune the hyperparameters of an XGBoost model:

Load the wine dataset from scikit-learn:

from sklearn import datasets

wine_dataset = datasets.load_wine()
X = wine_dataset.data
y = wine_dataset.target

Import XGBoost and stratified K-fold:

import xgboost as xgb
from sklearn.model_selection import StratifiedKFold

Import BayesSearchCV from scikit-optimize and specify the number of parameter settings to test:

from skopt import BayesSearchCV

n_iterations = 50

Specify your estimator. In this case, we select XGBoost and set it to be able to perform multi-class classification:

estimator = xgb.XGBClassifier(
    n_jobs=-1,
    objective="multi:softmax",
    eval_metric="merror",
    verbosity=0,
    num_class=len(set(y)),
)

Specify a parameter search space:

search_space = {
    "learning_rate": (0.01, 1.0, "log-uniform"),
    "min_child_weight": (0, 10),
    "max_depth": (1, 50),
    "max_delta_step": (0, 10),
    "subsample": (0.01, 1.0, "uniform"),
    "colsample_bytree": (0.01, 1.0, "log-uniform"),
    "colsample_bylevel": (0.01, 1.0, "log-uniform"),
    "reg_lambda": (1e-9, 1000, "log-uniform"),
    "reg_alpha": (1e-9, 1.0, "log-uniform"),
    "gamma": (1e-9, 0.5, "log-uniform"),
    "min_child_weight": (0, 5),
    "n_estimators": (5, 5000),
    "scale_pos_weight": (1e-6, 500, "log-uniform"),
}

Specify the type of cross-validation to perform:

cv = StratifiedKFold(n_splits=3, shuffle=True)

Define BayesSearchCV using the settings you have defined:

bayes_cv_tuner = BayesSearchCV(
    estimator=estimator,
    search_spaces=search_space,
    scoring="accuracy",
    cv=cv,
    n_jobs=-1,
    n_iter=n_iterations,
    verbose=0,
    refit=True,
)

Define a callback function to print out the progress of the parameter search:

import pandas as pd
import numpy as np

def print_status(optimal_result):
    """Shows the best parameters found and accuracy attained of the search so far."""
    models_tested = pd.DataFrame(bayes_cv_tuner.cv_results_)
    best_parameters_so_far = pd.Series(bayes_cv_tuner.best_params_)
    print(
        "Model #{}\nBest accuracy so far: {}\nBest parameters so far: {}\n".format(
            len(models_tested),
            np.round(bayes_cv_tuner.best_score_, 3),
            bayes_cv_tuner.best_params_,
        )
    )

    clf_type = bayes_cv_tuner.estimator.__class__.__name__
    models_tested.to_csv(clf_type + "_cv_results_summary.csv")

Perform the parameter search:

result = bayes_cv_tuner.fit(X, y, callback=print_status)

As you can see, the following shows the output:

Model #1
 Best accuracy so far: 0.972
 Best parameters so far: {'colsample_bylevel': 0.019767840658391753, 'colsample_bytree': 0.5812505808116454, 'gamma': 1.7784704701058755e-05, 'learning_rate': 0.9050859661329937, 'max_delta_step': 3, 'max_depth': 42, 'min_child_weight': 1, 'n_estimators': 2334, 'reg_alpha': 0.02886003776717955, 'reg_lambda': 0.0008507166793122457, 'scale_pos_weight': 4.801764874750116e-05, 'subsample': 0.7188797743009225}
 
 Model #2
 Best accuracy so far: 0.972
 Best parameters so far: {'colsample_bylevel': 0.019767840658391753, 'colsample_bytree': 0.5812505808116454, 'gamma': 1.7784704701058755e-05, 'learning_rate': 0.9050859661329937, 'max_delta_step': 3, 'max_depth': 42, 'min_child_weight': 1, 'n_estimators': 2334, 'reg_alpha': 0.02886003776717955, 'reg_lambda': 0.0008507166793122457, 'scale_pos_weight': 4.801764874750116e-05, 'subsample': 0.7188797743009225}

<snip>

Model #50
 Best accuracy so far: 0.989
 Best parameters so far: {'colsample_bylevel': 0.013417868502558758, 'colsample_bytree': 0.463490250419848, 'gamma': 2.2823050161337873e-06, 'learning_rate': 0.34006478878384533, 'max_delta_step': 9, 'max_depth': 41, 'min_child_weight': 0, 'n_estimators': 1951, 'reg_alpha': 1.8321791726476395e-08, 'reg_lambda': 13.098734837402576, 'scale_pos_weight': 0.6188077759379964, 'subsample': 0.7970035272497132}

How it works...

In steps 1 and 2, we import a standard dataset, the wine dataset, as well as the libraries needed for classification. A more interesting step follows, in which we specify how long we would like the hyperparameter search to be, in terms of a number of combinations of parameters to try. The longer the search, the better the results, at the risk of overfitting and extending the computational time. In step 4, we select XGBoost as the model, and then specify the number of classes, the type of problem, and the evaluation metric. This part will depend on the type of problem. For instance, for a regression problem, we might set eval_metric = 'rmse' and drop num_class together.

Other models than XGBoost can be selected with the hyperparameter optimizer as well. In the next step, (step 5), we specify a probability distribution over each parameter that we will be exploring. This is one of the advantages of using BayesSearchCV over a simple grid search, as it allows you to explore the parameter space more intelligently. Next, we specify our cross-validation scheme (step 6). Since we are performing a classification problem, it makes sense to specify a stratified fold. However, for a regression problem, StratifiedKFold should be replaced with KFold.

Also note that a larger splitting number is preferred for the purpose of measuring results, though it will come at a computational price. In step 7, you can see additional settings that can be changed. In particular, n_jobs allows you to parallelize the task. The verbosity and the method used for scoring can be altered as well. To monitor the search process and the performance of our hyperparameter tuning, we define a callback function to print out the progress in step 8. The results of the grid search are also saved in a CSV file. Finally, we run the hyperparameter search (step 9). The output allows us to observe the parameters and the performance of each iteration of the hyperparameter search.

In this book, we will refrain from tuning the hyperparameters of classifiers. The reason is in part brevity, and in part because hyperparameter tuning here would be premature optimization, as there is no specified requirement or goal for the performance of the algorithm from the end user. Having seen how to perform it here, you can easily adapt this recipe to the application at hand.

Another prominent library for hyperparameter tuning to keep in mind is hyperopt.