Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
The Kaggle Workbook
The Kaggle Workbook

The Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions

Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
€11.99 €17.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
eBook Feb 2023 172 pages 1st Edition
eBook
€11.99 €17.99
Paperback
€22.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
€11.99 €17.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
eBook Feb 2023 172 pages 1st Edition
eBook
€11.99 €17.99
Paperback
€22.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€11.99 €17.99
Paperback
€22.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

The Kaggle Workbook

Ensembling the results

Now, having two models, what’s left is to mix them together and see if we can improve the results. As suggested by Jahrer we go straight for a blend of them, but we do not limit ourselves to producing just an average of the two (since our approach in the end has slightly differed from Jahrer’s one) but we will also try to get optimal weights for the blend. We start importing the out-of-fold predictions and having our evaluation function ready.

import pandas as pd
import numpy as np
from numba import jit
@jit
def eval_gini(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_true = y_true[np.argsort(y_pred)]
    ntrue = 0
    gini = 0
    delta = 0
    n = len(y_true)
    for i in range(n-1, -1, -1):
        y_i = y_true[i]
        ntrue += y_i
        gini += y_i * delta
        delta += 1 - y_i
    gini = 1 - 2 * gini / (ntrue * (n - ntrue))
    return gini
lgb_oof = pd.read_csv("../input/workbook-lgb/lgb_oof.csv")
dnn_oof = pd.read_csv...

Understanding the competition and the data

The competition (https://www.kaggle.com/competitions/m5-forecasting-accuracy) ran from March to June 2020 and over 7,000 participants took part in it on Kaggle. The organizers arranged it into two separate tracks, one for point-wise prediction (accuracy track) and another one for estimating reliable values at different confidence intervals (uncertainty track). Our focus in this chapter will be to try to replicate one of the best submissions for the accuracy track and also pave the way for the uncertainty track (since it is based on the predictions of the accuracy one).

Walmart provided the data. It consisted of 42,840 daily sales time series of items hierarchically arranged into departments, categories, and stores spread in three U.S. states (the time series are somewhat correlated with each other). Along with the sales, Walmart also provided accompanying information (exogenous variables, usually not often provided in forecasting problems...

Understanding the Evaluation Metric

The accuracy competition introduced a new evaluation metric: Weighted Root Mean Squared Scaled Error (WRMSSE). You first start from the RMSSE of individual time series under scrutiny. The metric evaluates the deviation of the point forecasts around the mean of the realized values of the series being predicted:

where:

  • n is the length of the training sample
  • h is the forecasting horizon (in our case, it is h =28)
  • Yt is the sales value at time t; is the predicted value at time t

After estimating the RMSSE for all the 42,840 time series of the competition, the Weighted RMSSE will be computed as:

where wi is the weight of the ith series of the competition.

In the competition guidelines (https://mofc.unic.ac.cy/m5-competition/), in regard to RMSSE and WRMSSE, it is stated that:

  • The denominator of RMSSE is computed only for the time periods for which the examined product(s) are actively sold...

Examining the 4th place solution’s ideas from Monsaraida

There are many solutions available for the competition, mostly found on the competition Kaggle discussions pages. The top five methods of both challenges have also been gathered and published (except one because of proprietary rights) by the competition organizers themselves: https://github.com/Mcompetitions/M5-methods (by the way, reproducing the results of the winning submissions was a prerequisite for the collection of a competition prize).

Noticeably, all the Kagglers that placed in the higher ranks of the competitions have used, as their unique model type or in blended/stacked in ensembles, LightGBM because of its lesser memory usage and speed of computations, which gave it an advantage in the competition because of the large amount of times series to process and predict. But there are also other reasons for its success. Contrary to classical methods based on ARIMA, it doesn’t require relying on the analysis...

Computing predictions for specific dates and time horizons

The plan for replicating Monsaraida’s solution is to create a notebook customizable by input parameters to produce the necessary processed data for training and test datasets and the LightGBM models for predictions. The models, given data in the past, will be trained to learn to predict values in a specific number of days in the future. The best results can be obtained by having each model learn to predict the values in a specific week range in the future. Since we have to predict up to 28 days ahead, we need a model predicting from day +1 to day +7 in the future, then another one able to predict from day +8 to day +14, another from day +15 to +21, and finally, another one capable of handling predictions from day +22 to day +28. We will need a Kaggle notebook for each of these time ranges, thus we need four notebooks. Each of these notebooks will be trained to predict the future time span for each of the 10 stores that...

Assembling public and private predictions

You can see an example of how we assembled the predictions for both the public and private leaderboards here:

What changes between the public and private submissions is just the different last training day: it determines what days we are going to predict. The public leaderboard notebook has the last training day set to 1,913, and the private one has it set to 1,941. You can actually, just for validation purposes, create other versions of the public version notebook using these dates for creating past holdout validation sets: [1885, 1857, 1829, 1577]. Hence the notebook will produce predictions that you can test locally for confirming the predictive capability of the model.

Exercise 6

Please try different holdout...

Summary

In this second chapter, we took on quite a complex time series competition, hence the easiest top solution we tried is actually fairly complex, and it requires coding quite a lot of processing functions. After going through the chapter, you should have a better idea of how to process time series and have them predicted using gradient boosting. Favoring gradient-boosting solutions over traditional methods when you have enough data, as with this problem, should help you create strong solutions for complex problems with hierarchical correlations, intermittent series, and availability of covariates such as events, prices, or market conditions.

In the following chapters, you will tackle even more complex Kaggle competitions, dealing with images and text. You will be amazed at how much you can learn by recreating top-scoring solutions and understanding their inner workings.

Join our book’s Discord space

Join our...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Challenge yourself to start thinking like a Kaggle Grandmaster
  • Fill your portfolio with impressive case studies that will come in handy during interviews
  • Packed with exercises and notes pages for you to enhance your skills and record key findings

Description

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.

Who is this book for?

If you’re new to Kaggle and want to sink your teeth into practical exercises, start with The Kaggle Book, first. A basic understanding of the Kaggle platform, along with knowledge of machine learning and data science is a prerequisite. This book is suitable for anyone starting their Kaggle journey or veterans trying to get better at it. Data analysts/scientists who want to do better in Kaggle competitions and secure jobs with tech giants will find this book helpful.

What you will learn

  • Take your modeling to the next level by analyzing different case studies
  • Boost your data science skillset with a curated selection of exercises
  • Combine different methods to create better solutions
  • Get a deeper insight into NLP and how it can help you solve unlikely challenges
  • Sharpen your knowledge of time-series forecasting
  • Challenge yourself to become a better data scientist

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804610114
Vendor :
Google
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804610114
Vendor :
Google
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 120.97
The Kaggle Book
€59.99
Developing Kaggle Notebooks
€37.99
The Kaggle Workbook
€22.99
Total 120.97 Stars icon

Table of Contents

6 Chapters
The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction Chevron down icon Chevron up icon
The Makridakis Competitions – M5 on Kaggle for Accuracy and Uncertainty Chevron down icon Chevron up icon
Vision Competition – Cassava Leaf Disease Competition Chevron down icon Chevron up icon
NLP Competition – Google Quest Q&A Labeling Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(25 Ratings)
5 star 92%
4 star 4%
3 star 0%
2 star 0%
1 star 4%
Filter icon Filter
Top Reviews

Filter reviews by




Amznswap Feb 26, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an excellent deep-dive into the nitty-gritties of the Kaggle competition environment.The book is comprehensive, it furnishes diverse competition case-studies in domains like forecasting, NLP and Computer Vision.It provides ample context by distilling the top discussions by leaderboard rankers and complete SotA solution building practise.Particularly impressive are the in-depth sections on the metrics used by these competitions,helping the reader lucidly understand the data-science metrics used by top companies to evaluate ML models.I believe that this book will surely help any novice user get their hands dirty withpractical data-science, beyond the theoretical model fundamentals covered in the Kaggle Book.
Amazon Verified review Amazon
Daniel Brooks Mar 18, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A hands-on introduction to machine learning. The book covers 4 example competitions on the Kaggle platform - tabular data, time series analysis, computer vision, and NLP. The commentary on each is thorough and reads easily. A great read for those looking to learn more about Kaggle competitions.
Amazon Verified review Amazon
Paul Perry Apr 12, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This Kaggle Workbook brings great depth in specific areas and Chapter 4 alone is invaluable, especially now as we all delve deeper into NLP, ChatGPT, and Transformers. This is an excellent resource and I'll tell you why.Firstly, for those aspiring to be experts in AI, ML, and NLP, it's essential to immerse yourself in Kaggle, transcend academic learning, and truly grasp what it takes to achieve top-performing solutions. For those who are new or relatively new to Kaggle, you'll definitely need the broader context provided by the Kaggle Book, but as a practitioner, the Kaggle Workbook goes deeper and dissects the top solutions to 4 specific competitions. As someone who has competed many times and reached the level of Kaggle Master, I find it incredibly valuable to have an in-depth walkthrough of a previous competition. Documenting the top solutions is a ton of work and time I don't have, and here it's as if I have a front row seat to a live competition! It also serves as fantastic blueprint for how to study past competitions.I've tried to structure my code, document my solutions and store them on GitHub, but all I offer is messy raw code of a ton of failed experiments. But in this Kaggle Workbook Konrad and Luca do all the work and provide all the links an references, and I appreciate their expert view because I don't want to read every forum post to recreate what happened.I'm now using this book to delve into Chapter 4 and looking at the innovative techniques around Transformers. I'm glad to have found this book.
Amazon Verified review Amazon
Samuel de Zoete Mar 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The Kaggle workbook has important exercises, which accompanies the Kaggle book. The Kaggle book was already fantastic for any level of Data Scientist by the way, and now the workbook gives you the confidence and tools to do it really yourself. Having done a few Kaggle competitions in the past, and I can highly recommend it to everyone, regardless your current skill level, there is always something to learn. The Kaggle book and workbook managed to turn this 'always something to learn ' into a super practical course in machine learning and I have to use the cliché "A must have... really you do!" .
Amazon Verified review Amazon
Gianluca Rossi Mar 03, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book focuses on four practical examples covering tabular data, time series, NLP, and computer vision. The examples are based on high-ranked solutions in recent Kaggle competitions. The author did a great job describing the reasoning behind every code snippet and sharing tips that can be useful in Kaggle and any ML projects. I particularly appreciated the effort in making the code very readable yet concise. The exercises are educational and require the reader to stop and reason. It's an effortless read, despite the solutions being sophisticated and state-of-the-art. This is a testament to the authors writing abilities and extensive knowledge. This book is highly recommended for anyone serious about improving their ML skills.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.