Overview of deep learning techniques to handle data imbalance
Much like the first half of this book, where we focused on classical machine learning techniques, the major categories typically include sampling techniques, cost-sensitive techniques, threshold adjustment techniques, or a combination of these:
- The sampling techniques comprise either undersampling the majority class or oversampling the minority class data. Data augmentation is a fundamental technique in computer vision problems that’s used to increase the diversity of the training set. While not directly an oversampling method aimed at addressing class imbalance, data augmentation does have the effect of expanding the training data. We will discuss these techniques in more detail in Chapter 7, Data-Level Deep Learning Methods.
- Cost-sensitive techniques usually involve changing the model loss function in some way to accommodate the higher cost of misclassifying the minority class examples. Some standard...