When to avoid undersampling the majority class
Undersampling is not a panacea and may not always work. It depends on the dataset and model under consideration:
- Too little training data for all the classes: If the dataset is already small, undersampling the majority class can lead to a significant loss of information. In such cases, it is advisable to try gathering more data or exploring other techniques, such as oversampling the minority class to balance the class distribution.
- Majority class equally important or more important than minority class: In specific scenarios, such as the spam filtering example mentioned in Chapter 1, Introduction to Data Imbalance in Machine Learning, it is crucial to maintain high accuracy in identifying the majority class instances. In such situations, undersampling the majority class might reduce the model’s ability to accurately classify majority class instances, leading to a higher false positive rate. Instead, alternative methods...