You're reading from The Kaggle Workbook Self-learning exercises and valuable insights for Kaggle data science competitions

Product type Paperback

Published in Feb 2023

Publisher Packt

ISBN-13 9781804611210

Length 172 pages

Edition 1st Edition

Languages

Python

Tools

Boost

Concepts

Data Science

Authors (2):

Luca Massaron

Konrad Banachewicz

View More author details

Table of Contents (7) Chapters

Preface

1. The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction

2. The Makridakis Competitions – M5 on Kaggle for Accuracy and Uncertainty FREE CHAPTER

3. Vision Competition – Cassava Leaf Disease Competition

4. NLP Competition – Google Quest Q&A Labeling

5. Other Books You May Enjoy

6. Index

Ensembling the results

Now, having two models, what’s left is to mix them together and see if we can improve the results. As suggested by Jahrer we go straight for a blend of them, but we do not limit ourselves to producing just an average of the two (since our approach in the end has slightly differed from Jahrer’s one) and we will also try to get optimal weights for the blend. We start importing the out-of-fold predictions and get our evaluation function ready:

import pandas as pd
import numpy as np
from numba import jit
@jit
def eval_gini(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_true = y_true[np.argsort(y_pred)]
    ntrue = 0
    gini = 0
    delta = 0
    n = len(y_true)
    for i in range(n-1, -1, -1):
        y_i = y_true[i]
        ntrue += y_i
        gini += y_i * delta
        delta += 1 - y_i
    gini = 1 - 2 * gini / (ntrue * (n - ntrue))
    return gini
lgb_oof = pd.read_csv("../input/workbook-lgb/lgb_oof.csv")
dnn_oof = pd.read_csv("../input/workbook-dae/dnn_oof.csv")
target = pd.read_csv("../input/porto-seguro-safe-driver-prediction/train.csv", usecols=['id','target'])

Once done, we convert the out-of-fold predictions of the LightGBM and the predictions of the neural network into ranks. We are doing so because the normalized Gini coefficient is based on rankings (as a ROC-AUC evaluation would be) and consequently blending rankings works better than blending the predicted probabilities::

lgb_oof_ranks = (lgb_oof.target.rank() / len(lgb_oof))
dnn_oof_ranks = (dnn_oof.target.rank() / len(dnn_oof))

Now we just test if, by combining the two models using different weights, we can get a better evaluation of the out-of-fold data:

baseline = eval_gini(y_true=target.target, y_pred=lgb_oof_ranks)
print(f"starting from a oof lgb baseline {baseline:0.5f}\n")
best_alpha = 1.0
for alpha in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:
    ensemble = alpha * lgb_oof_ranks + (1.0 - alpha) * dnn_oof_ranks
    score = eval_gini(y_true=target.target, y_pred=ensemble)
    print(f"lgd={alpha:0.1f} dnn={(1.0 - alpha):0.1f} -> {score:0.5f}")
    
    if score > baseline:
        baseline = score
        best_alpha = alpha
        
print(f"\nBest alpha is {best_alpha:0.1f}")

When ready, by running the snippet we can get interesting results:

starting from a oof lgb baseline 0.28850
lgd=0.1 dnn=0.9 -> 0.27352
lgd=0.2 dnn=0.8 -> 0.27744
lgd=0.3 dnn=0.7 -> 0.28084
lgd=0.4 dnn=0.6 -> 0.28368
lgd=0.5 dnn=0.5 -> 0.28595
lgd=0.6 dnn=0.4 -> 0.28763
lgd=0.7 dnn=0.3 -> 0.28873
lgd=0.8 dnn=0.2 -> 0.28923
lgd=0.9 dnn=0.1 -> 0.28916
Best alpha is 0.8

It seems that blending a strong weight (0.8) on the LightGBM model and a weaker one (0.2) on the neural network will bring an even better-performing model. We immediately try this hypothesis by setting a blend of the same weights for the models and the ideal weights that we have found:

lgb_submission = pd.read_csv("../input/workbook-lgb/lgb_submission.csv")
dnn_submission = pd.read_csv("../input/workbook-dae/dnn_submission.csv")
submission = pd.read_csv(
"../input/porto-seguro-safe-driver-prediction/sample_submission.csv")

First, we try the equal weights solution, which was the strategy used by Michael Jahrer:

lgb_ranks = (lgb_submission.target.rank() / len(lgb_submission))
dnn_ranks = (dnn_submission.target.rank() / len(dnn_submission))
submission.target = lgb_ranks * 0.5 + dnn_ranks * 0.5
submission.to_csv("equal_blend_rank.csv", index=False)

It leads to a public score of 0.28393 and a private score of 0.29093, which is around 50^th position on the final leaderboard, a bit far from our expectations. Now let’s try using the weights that the out-of-fold predictions helped us to find:

lgb_ranks = (lgb_submission.target.rank() / len(lgb_submission))
dnn_ranks = (dnn_submission.target.rank() / len(dnn_submission))
submission.target = lgb_ranks * best_alpha +  dnn_ranks * (1.0 - best_alpha)
submission.to_csv("blend_rank.csv", index=False)

Here the results lead to a public score of 0.28502 and a private score of 0.29192, which turns out to be around the seventh position on the final leaderboard. A much better result indeed because the LightGBM is a good model, but it is probably missing some nuances in the data that can be provided by adding some information from the neural network trained on the denoised data.

Exercise 6

As pointed out by CPMP in their solution (https://www.kaggle.com/competitions/porto-seguro-safe-driver-prediction/discussion/44614), depending on how to build your cross-validation, you can experience a “huge variation of Gini scores among folds.” For this reason, CPMP suggests decreasing the variance of the estimates by using many different seeds for multiple cross-validations and averaging the results.

As an exercise, try to modify the code we used to create more stable predictions, especially for the denoising autoencoder.

Exercise Notes (write down any notes or workings that will help you):

The rest of the chapter is locked

You're reading from The Kaggle Workbook Self-learning exercises and valuable insights for Kaggle data science competitions

Table of Contents (7) Chapters

Ensembling the results

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you