Machine Learning for Imbalanced Data

Oversampling Methods

In machine learning, we often don’t have enough samples of the minority class. One solution might be to gather more samples of such a class. For example, in the problem of detecting whether a patient has cancer or not, if we don’t have enough samples of the cancer class, we can wait for some time to gather more samples. However, such a strategy is not always feasible or sensible and can be time-consuming. In such cases, we can augment our data by using various techniques. One such technique is oversampling.

In this chapter, we will introduce the concept of oversampling, discuss when to use it, and the various techniques to perform it. We will also demonstrate how to utilize these techniques through the imbalanced-learn library APIs and compare their performance using some classical machine learning models. Finally, we will conclude with some practical advice on which techniques tend to work best under specific real-world conditions.

In this...

Filter reviews by

All

Amazon verified reviews

Ranja Feb 06, 2024

This book on tackling real imbalanced datasets in machine learning is a detailed and comprehensive guide. The chapters ‘cost-sensitive learning’ and ‘model calibration’ require special mention, which were blended in well with other chapters on over-sampling, under-sampling and ensemble techniques for handling data imbalance. While some essential concepts have in-depth explanations and rightfully so, the authors have managed well to keep the book intriguing all along which makes it a prized resource for all machine learning practitioners.

Amazon Verified review

Advitya Gemawat Jan 07, 2024

The book covers various methods to address the class imbalance problem, and covers usage with popular python libraries and typical evaluation metrics from the lens of class imbalance.Here're some of my top takeaways from the book:🎲 Sampling methods, such as over-sampling, under-sampling, and hybrid sampling, to balance the data distribution📊 Cost-sensitive learning, which assigns different weights or costs to different classes, to make the model more sensitive to the minority class📈 Threshold adjustment, which modifies the decision threshold of the model, to improve the performance metrics🗂 Model calibration, which adjusts the predicted probabilities of the model, to make them more reliable and interpretable🚀 My favorite part of the book: How several big tech companies are solving data imbalance challenges in different contexts🗃 There's a python library `imbalanced-learn` that offers out-of-the-box techniques to deal with data imbalance and can also be used to create corresponding synthetic datasetsHaving read several books from Packt, it's so interesting to go through these books as they deal with very specific subtopics within ML and provide an entire landscape of practical techniques, real-world use-cases, and top takeaways for practitioners based on research findings.

H2N Dec 14, 2023

Machine Learning for Imbalanced Data is a helpful guide to deal with imbalanced data in machine learning. The authors talked about various strategies and best practices to address the complexity of data imbalance, underscoring the importance of context. Lots of techniques were covered in the book such as oversampling methods to deep learning approaches with real-world applications. A nice book for anyone to learn and work in machine learning.

Ashish Tiwari Dec 14, 2023

"Machine Learning for Imbalanced Data" is an insightful, 300+ page journey into the complexities of machine learning, especially tailored for those with some prior experience. It's a well-crafted guide that demystifies topics like oversampling, undersampling, deep learning techniques, and model calibration with rich details. The book excels in blending theoretical concepts with practical Python code examples, making it a valuable reference for real-world applications. Its approachable style, coupled with comprehensive content, makes it an indispensable resource for anyone looking to master the intricacies of machine learning in the context of imbalanced data.

Snigdha Dec 31, 2023

This book provided a great overview in a concise and clear format of dealing with imbalanced datasets and what techniques to use. The text contains helpful examples and insights from the author's industry experience. I enjoyed the cartoon strips added in chapters for easy understanding. The collab notebooks provided in the GitHub repo provide the coding practice needed to utilize the theory in the book. I would recommend this to anyone learning more about machine learning as most of the datasets in real life are imbalanced.

Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques

What do you get with a Packt Subscription?

Machine Learning for Imbalanced Data

Oversampling Methods

Technical requirements

What is oversampling?

Random oversampling

SMOTE

SMOTE variants

Borderline-SMOTE

Why consider samples on the classification boundary?

ADASYN

Model performance comparison of various oversampling methods

Guidance for using various oversampling techniques

Oversampling in multi-class classification

Summary

Exercises

References

Page 1 of 13

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs