Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
The Kaggle Workbook
The Kaggle Workbook

The Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions

Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
$29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
Paperback Feb 2023 172 pages 1st Edition
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Konrad Banachewicz Profile Icon Luca Massaron
Arrow right icon
$29.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (25 Ratings)
Paperback Feb 2023 172 pages 1st Edition
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$15.99 $23.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

The Kaggle Workbook

Ensembling the results

Now, having two models, what’s left is to mix them together and see if we can improve the results. As suggested by Jahrer we go straight for a blend of them, but we do not limit ourselves to producing just an average of the two (since our approach in the end has slightly differed from Jahrer’s one) but we will also try to get optimal weights for the blend. We start importing the out-of-fold predictions and having our evaluation function ready.

import pandas as pd
import numpy as np
from numba import jit
@jit
def eval_gini(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_true = y_true[np.argsort(y_pred)]
    ntrue = 0
    gini = 0
    delta = 0
    n = len(y_true)
    for i in range(n-1, -1, -1):
        y_i = y_true[i]
        ntrue += y_i
        gini += y_i * delta
        delta += 1 - y_i
    gini = 1 - 2 * gini / (ntrue * (n - ntrue))
    return gini
lgb_oof = pd.read_csv("../input/workbook-lgb/lgb_oof.csv")
dnn_oof = pd.read_csv...

Understanding the competition and the data

The competition (https://www.kaggle.com/competitions/m5-forecasting-accuracy) ran from March to June 2020 and over 7,000 participants took part in it on Kaggle. The organizers arranged it into two separate tracks, one for point-wise prediction (accuracy track) and another one for estimating reliable values at different confidence intervals (uncertainty track). Our focus in this chapter will be to try to replicate one of the best submissions for the accuracy track and also pave the way for the uncertainty track (since it is based on the predictions of the accuracy one).

Walmart provided the data. It consisted of 42,840 daily sales time series of items hierarchically arranged into departments, categories, and stores spread in three U.S. states (the time series are somewhat correlated with each other). Along with the sales, Walmart also provided accompanying information (exogenous variables, usually not often provided in forecasting problems...

Understanding the Evaluation Metric

The accuracy competition introduced a new evaluation metric: Weighted Root Mean Squared Scaled Error (WRMSSE). You first start from the RMSSE of individual time series under scrutiny. The metric evaluates the deviation of the point forecasts around the mean of the realized values of the series being predicted:

where:

  • n is the length of the training sample
  • h is the forecasting horizon (in our case, it is h =28)
  • Yt is the sales value at time t; is the predicted value at time t

After estimating the RMSSE for all the 42,840 time series of the competition, the Weighted RMSSE will be computed as:

where wi is the weight of the ith series of the competition.

In the competition guidelines (https://mofc.unic.ac.cy/m5-competition/), in regard to RMSSE and WRMSSE, it is stated that:

  • The denominator of RMSSE is computed only for the time periods for which the examined product(s) are actively sold...

Examining the 4th place solution’s ideas from Monsaraida

There are many solutions available for the competition, mostly found on the competition Kaggle discussions pages. The top five methods of both challenges have also been gathered and published (except one because of proprietary rights) by the competition organizers themselves: https://github.com/Mcompetitions/M5-methods (by the way, reproducing the results of the winning submissions was a prerequisite for the collection of a competition prize).

Noticeably, all the Kagglers that placed in the higher ranks of the competitions have used, as their unique model type or in blended/stacked in ensembles, LightGBM because of its lesser memory usage and speed of computations, which gave it an advantage in the competition because of the large amount of times series to process and predict. But there are also other reasons for its success. Contrary to classical methods based on ARIMA, it doesn’t require relying on the analysis...

Computing predictions for specific dates and time horizons

The plan for replicating Monsaraida’s solution is to create a notebook customizable by input parameters to produce the necessary processed data for training and test datasets and the LightGBM models for predictions. The models, given data in the past, will be trained to learn to predict values in a specific number of days in the future. The best results can be obtained by having each model learn to predict the values in a specific week range in the future. Since we have to predict up to 28 days ahead, we need a model predicting from day +1 to day +7 in the future, then another one able to predict from day +8 to day +14, another from day +15 to +21, and finally, another one capable of handling predictions from day +22 to day +28. We will need a Kaggle notebook for each of these time ranges, thus we need four notebooks. Each of these notebooks will be trained to predict the future time span for each of the 10 stores that...

Assembling public and private predictions

You can see an example of how we assembled the predictions for both the public and private leaderboards here:

What changes between the public and private submissions is just the different last training day: it determines what days we are going to predict. The public leaderboard notebook has the last training day set to 1,913, and the private one has it set to 1,941. You can actually, just for validation purposes, create other versions of the public version notebook using these dates for creating past holdout validation sets: [1885, 1857, 1829, 1577]. Hence the notebook will produce predictions that you can test locally for confirming the predictive capability of the model.

Exercise 6

Please try different holdout...

Summary

In this second chapter, we took on quite a complex time series competition, hence the easiest top solution we tried is actually fairly complex, and it requires coding quite a lot of processing functions. After going through the chapter, you should have a better idea of how to process time series and have them predicted using gradient boosting. Favoring gradient-boosting solutions over traditional methods when you have enough data, as with this problem, should help you create strong solutions for complex problems with hierarchical correlations, intermittent series, and availability of covariates such as events, prices, or market conditions.

In the following chapters, you will tackle even more complex Kaggle competitions, dealing with images and text. You will be amazed at how much you can learn by recreating top-scoring solutions and understanding their inner workings.

Join our book’s Discord space

Join our...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Challenge yourself to start thinking like a Kaggle Grandmaster
  • Fill your portfolio with impressive case studies that will come in handy during interviews
  • Packed with exercises and notes pages for you to enhance your skills and record key findings

Description

More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist. In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering. You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.

Who is this book for?

If you’re new to Kaggle and want to sink your teeth into practical exercises, start with The Kaggle Book, first. A basic understanding of the Kaggle platform, along with knowledge of machine learning and data science is a prerequisite. This book is suitable for anyone starting their Kaggle journey or veterans trying to get better at it. Data analysts/scientists who want to do better in Kaggle competitions and secure jobs with tech giants will find this book helpful.

What you will learn

  • Take your modeling to the next level by analyzing different case studies
  • Boost your data science skillset with a curated selection of exercises
  • Combine different methods to create better solutions
  • Get a deeper insight into NLP and how it can help you solve unlikely challenges
  • Sharpen your knowledge of time-series forecasting
  • Challenge yourself to become a better data scientist
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804611210
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Feb 24, 2023
Length: 172 pages
Edition : 1st
Language : English
ISBN-13 : 9781804611210
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 159.97
The Kaggle Book
$79.99
Developing Kaggle Notebooks
$49.99
The Kaggle Workbook
$29.99
Total $ 159.97 Stars icon

Table of Contents

6 Chapters
The Most Renowned Tabular Competition – Porto Seguro’s Safe Driver Prediction Chevron down icon Chevron up icon
The Makridakis Competitions – M5 on Kaggle for Accuracy and Uncertainty Chevron down icon Chevron up icon
Vision Competition – Cassava Leaf Disease Competition Chevron down icon Chevron up icon
NLP Competition – Google Quest Q&A Labeling Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(25 Ratings)
5 star 92%
4 star 4%
3 star 0%
2 star 0%
1 star 4%
Filter icon Filter
Top Reviews

Filter reviews by




Amznswap Feb 26, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an excellent deep-dive into the nitty-gritties of the Kaggle competition environment.The book is comprehensive, it furnishes diverse competition case-studies in domains like forecasting, NLP and Computer Vision.It provides ample context by distilling the top discussions by leaderboard rankers and complete SotA solution building practise.Particularly impressive are the in-depth sections on the metrics used by these competitions,helping the reader lucidly understand the data-science metrics used by top companies to evaluate ML models.I believe that this book will surely help any novice user get their hands dirty withpractical data-science, beyond the theoretical model fundamentals covered in the Kaggle Book.
Amazon Verified review Amazon
Daniel Brooks Mar 18, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A hands-on introduction to machine learning. The book covers 4 example competitions on the Kaggle platform - tabular data, time series analysis, computer vision, and NLP. The commentary on each is thorough and reads easily. A great read for those looking to learn more about Kaggle competitions.
Amazon Verified review Amazon
Paul Perry Apr 12, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This Kaggle Workbook brings great depth in specific areas and Chapter 4 alone is invaluable, especially now as we all delve deeper into NLP, ChatGPT, and Transformers. This is an excellent resource and I'll tell you why.Firstly, for those aspiring to be experts in AI, ML, and NLP, it's essential to immerse yourself in Kaggle, transcend academic learning, and truly grasp what it takes to achieve top-performing solutions. For those who are new or relatively new to Kaggle, you'll definitely need the broader context provided by the Kaggle Book, but as a practitioner, the Kaggle Workbook goes deeper and dissects the top solutions to 4 specific competitions. As someone who has competed many times and reached the level of Kaggle Master, I find it incredibly valuable to have an in-depth walkthrough of a previous competition. Documenting the top solutions is a ton of work and time I don't have, and here it's as if I have a front row seat to a live competition! It also serves as fantastic blueprint for how to study past competitions.I've tried to structure my code, document my solutions and store them on GitHub, but all I offer is messy raw code of a ton of failed experiments. But in this Kaggle Workbook Konrad and Luca do all the work and provide all the links an references, and I appreciate their expert view because I don't want to read every forum post to recreate what happened.I'm now using this book to delve into Chapter 4 and looking at the innovative techniques around Transformers. I'm glad to have found this book.
Amazon Verified review Amazon
Samuel de Zoete Mar 21, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The Kaggle workbook has important exercises, which accompanies the Kaggle book. The Kaggle book was already fantastic for any level of Data Scientist by the way, and now the workbook gives you the confidence and tools to do it really yourself. Having done a few Kaggle competitions in the past, and I can highly recommend it to everyone, regardless your current skill level, there is always something to learn. The Kaggle book and workbook managed to turn this 'always something to learn ' into a super practical course in machine learning and I have to use the cliché "A must have... really you do!" .
Amazon Verified review Amazon
Gianluca Rossi Mar 03, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book focuses on four practical examples covering tabular data, time series, NLP, and computer vision. The examples are based on high-ranked solutions in recent Kaggle competitions. The author did a great job describing the reasoning behind every code snippet and sharing tips that can be useful in Kaggle and any ML projects. I particularly appreciated the effort in making the code very readable yet concise. The exercises are educational and require the reader to stop and reason. It's an effortless read, despite the solutions being sophisticated and state-of-the-art. This is a testament to the authors writing abilities and extensive knowledge. This book is highly recommended for anyone serious about improving their ML skills.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela