Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
The Kaggle Workbook
The Kaggle Workbook

The Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions

Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
Paperback Feb 2023 172 pages 1st Edition
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
Paperback Feb 2023 172 pages 1st Edition
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

The Kaggle Workbook

Ensembling the results

Now, having two models, what’s left is to mix them together and see if we can improve the results. As suggested by Jahrer we go straight for a blend of them, but we do not limit ourselves to producing just an average of the two (since our approach in the end has slightly differed from Jahrer’s one) but we will also try to get optimal weights for the blend. We start importing the out-of-fold predictions and having our evaluation function ready.

import pandas as pd
import numpy as np
from numba import jit
@jit
def eval_gini(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_true = y_true[np.argsort(y_pred)]
    ntrue = 0
    gini = 0
    delta = 0
    n = len(y_true)
    for i in range(n-1, -1, -1):
        y_i = y_true[i]
        ntrue += y_i
        gini += y_i * delta
        delta += 1 - y_i
    gini = 1 - 2 * gini / (ntrue * (n - ntrue))
    return gini
lgb_oof = pd.read_csv("../input/workbook-lgb/lgb_oof.csv")
dnn_oof = pd.read_csv...

Understanding the competition and the data

The competition (https://www.kaggle.com/competitions/m5-forecasting-accuracy) ran from March to June 2020 and over 7,000 participants took part in it on Kaggle. The organizers arranged it into two separate tracks, one for point-wise prediction (accuracy track) and another one for estimating reliable values at different confidence intervals (uncertainty track). Our focus in this chapter will be to try to replicate one of the best submissions for the accuracy track and also pave the way for the uncertainty track (since it is based on the predictions of the accuracy one).

Walmart provided the data. It consisted of 42,840 daily sales time series of items hierarchically arranged into departments, categories, and stores spread in three U.S. states (the time series are somewhat correlated with each other). Along with the sales, Walmart also provided accompanying information (exogenous variables, usually not often provided in forecasting problems...

Understanding the Evaluation Metric

The accuracy competition introduced a new evaluation metric: Weighted Root Mean Squared Scaled Error (WRMSSE). You first start from the RMSSE of individual time series under scrutiny. The metric evaluates the deviation of the point forecasts around the mean of the realized values of the series being predicted:

where:

  • n is the length of the training sample
  • h is the forecasting horizon (in our case, it is h =28)
  • Yt is the sales value at time t; is the predicted value at time t

After estimating the RMSSE for all the 42,840 time series of the competition, the Weighted RMSSE will be computed as:

where wi is the weight of the ith series of the competition.

In the competition guidelines (https://mofc.unic.ac.cy/m5-competition/), in regard to RMSSE and WRMSSE, it is stated that:

  • The denominator of RMSSE is computed only for the time periods for which the examined product(s) are actively sold...

Examining the 4th place solution’s ideas from Monsaraida

There are many solutions available for the competition, mostly found on the competition Kaggle discussions pages. The top five methods of both challenges have also been gathered and published (except one because of proprietary rights) by the competition organizers themselves: https://github.com/Mcompetitions/M5-methods (by the way, reproducing the results of the winning submissions was a prerequisite for the collection of a competition prize).

Noticeably, all the Kagglers that placed in the higher ranks of the competitions have used, as their unique model type or in blended/stacked in ensembles, LightGBM because of its lesser memory usage and speed of computations, which gave it an advantage in the competition because of the large amount of times series to process and predict. But there are also other reasons for its success. Contrary to classical methods based on ARIMA, it doesn’t require relying on the analysis...

Computing predictions for specific dates and time horizons

The plan for replicating Monsaraida’s solution is to create a notebook customizable by input parameters to produce the necessary processed data for training and test datasets and the LightGBM models for predictions. The models, given data in the past, will be trained to learn to predict values in a specific number of days in the future. The best results can be obtained by having each model learn to predict the values in a specific week range in the future. Since we have to predict up to 28 days ahead, we need a model predicting from day +1 to day +7 in the future, then another one able to predict from day +8 to day +14, another from day +15 to +21, and finally, another one capable of handling predictions from day +22 to day +28. We will need a Kaggle notebook for each of these time ranges, thus we need four notebooks. Each of these notebooks will be trained to predict the future time span for each of the 10 stores that...

Assembling public and private predictions

You can see an example of how we assembled the predictions for both the public and private leaderboards here:

What changes between the public and private submissions is just the different last training day: it determines what days we are going to predict. The public leaderboard notebook has the last training day set to 1,913, and the private one has it set to 1,941. You can actually, just for validation purposes, create other versions of the public version notebook using these dates for creating past holdout validation sets: [1885, 1857, 1829, 1577]. Hence the notebook will produce predictions that you can test locally for confirming the predictive capability of the model.

Exercise 6

Please try different holdout...

Summary

In this second chapter, we took on quite a complex time series competition, hence the easiest top solution we tried is actually fairly complex, and it requires coding quite a lot of processing functions. After going through the chapter, you should have a better idea of how to process time series and have them predicted using gradient boosting. Favoring gradient-boosting solutions over traditional methods when you have enough data, as with this problem, should help you create strong solutions for complex problems with hierarchical correlations, intermittent series, and availability of covariates such as events, prices, or market conditions.

In the following chapters, you will tackle even more complex Kaggle competitions, dealing with images and text. You will be amazed at how much you can learn by recreating top-scoring solutions and understanding their inner workings.

Join our book’s Discord space

Join our...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Challenge yourself to start thinking like a Kaggle Grandmaster
  • Fill your portfolio with impressive case studies that will come in handy during interviews
  • Packed with exercises and notes pages for you to enhance your skills and record key findings

Description

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.

Who is this book for?

If you’re new to Kaggle and want to sink your teeth into practical exercises, start with The Kaggle Book, first. A basic understanding of the Kaggle platform, along with knowledge of machine learning and data science is a prerequisite. This book is suitable for anyone starting their Kaggle journey or veterans trying to get better at it. Data analysts/scientists who want to do better in Kaggle competitions and secure jobs with tech giants will find this book helpful.

What you will learn

  • Take your modeling to the next level by analyzing different case studies
  • Boost your data science skillset with a curated selection of exercises
  • Combine different methods to create better solutions
  • Get a deeper insight into NLP and how it can help you solve unlikely challenges
  • Sharpen your knowledge of time-series forecasting
  • Challenge yourself to become a better data scientist

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804611210
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804611210
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 159.97
The Kaggle Book
$79.99
Developing Kaggle Notebooks
$49.99
The Kaggle Workbook
$29.99
Total $ 159.97 Stars icon

Table of Contents

6 Chapters
The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction Chevron down icon Chevron up icon
The Makridakis Competitions – M5 on Kaggle for Accuracy and Uncertainty Chevron down icon Chevron up icon
Vision Competition – Cassava Leaf Disease Competition Chevron down icon Chevron up icon
NLP Competition – Google Quest Q&A Labeling Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(25 Ratings)
5 star 92%
4 star 4%
3 star 0%
2 star 0%
1 star 4%
Filter icon Filter
Top Reviews

Filter reviews by




Amznswap Feb 26, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an excellent deep-dive into the nitty-gritties of the Kaggle competition environment.The book is comprehensive, it furnishes diverse competition case-studies in domains like forecasting, NLP and Computer Vision.It provides ample context by distilling the top discussions by leaderboard rankers and complete SotA solution building practise.Particularly impressive are the in-depth sections on the metrics used by these competitions,helping the reader lucidly understand the data-science metrics used by top companies to evaluate ML models.I believe that this book will surely help any novice user get their hands dirty withpractical data-science, beyond the theoretical model fundamentals covered in the Kaggle Book.
Amazon Verified review Amazon
Daniel Brooks Mar 18, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A hands-on introduction to machine learning. The book covers 4 example competitions on the Kaggle platform - tabular data, time series analysis, computer vision, and NLP. The commentary on each is thorough and reads easily. A great read for those looking to learn more about Kaggle competitions.
Amazon Verified review Amazon
Paul Perry Apr 12, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This Kaggle Workbook brings great depth in specific areas and Chapter 4 alone is invaluable, especially now as we all delve deeper into NLP, ChatGPT, and Transformers. This is an excellent resource and I'll tell you why.Firstly, for those aspiring to be experts in AI, ML, and NLP, it's essential to immerse yourself in Kaggle, transcend academic learning, and truly grasp what it takes to achieve top-performing solutions. For those who are new or relatively new to Kaggle, you'll definitely need the broader context provided by the Kaggle Book, but as a practitioner, the Kaggle Workbook goes deeper and dissects the top solutions to 4 specific competitions. As someone who has competed many times and reached the level of Kaggle Master, I find it incredibly valuable to have an in-depth walkthrough of a previous competition. Documenting the top solutions is a ton of work and time I don't have, and here it's as if I have a front row seat to a live competition! It also serves as fantastic blueprint for how to study past competitions.I've tried to structure my code, document my solutions and store them on GitHub, but all I offer is messy raw code of a ton of failed experiments. But in this Kaggle Workbook Konrad and Luca do all the work and provide all the links an references, and I appreciate their expert view because I don't want to read every forum post to recreate what happened.I'm now using this book to delve into Chapter 4 and looking at the innovative techniques around Transformers. I'm glad to have found this book.
Amazon Verified review Amazon
Samuel de Zoete Mar 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The Kaggle workbook has important exercises, which accompanies the Kaggle book. The Kaggle book was already fantastic for any level of Data Scientist by the way, and now the workbook gives you the confidence and tools to do it really yourself. Having done a few Kaggle competitions in the past, and I can highly recommend it to everyone, regardless your current skill level, there is always something to learn. The Kaggle book and workbook managed to turn this 'always something to learn ' into a super practical course in machine learning and I have to use the cliché "A must have... really you do!" .
Amazon Verified review Amazon
Gianluca Rossi Mar 03, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book focuses on four practical examples covering tabular data, time series, NLP, and computer vision. The examples are based on high-ranked solutions in recent Kaggle competitions. The author did a great job describing the reasoning behind every code snippet and sharing tips that can be useful in Kaggle and any ML projects. I particularly appreciated the effort in making the code very readable yet concise. The exercises are educational and require the reader to stop and reason. It's an effortless read, despite the solutions being sophisticated and state-of-the-art. This is a testament to the authors writing abilities and extensive knowledge. This book is highly recommended for anyone serious about improving their ML skills.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.