Motivation for algorithm-level techniques
In this chapter, we will concentrate on deep learning techniques that have gained popularity in both the vision and text domains. We will mostly use a long-tailed imbalanced version of the MNIST dataset, similar to what we used in Chapter 7, Data-Level Deep Learning Methods. We will also consider CIFAR10-LT, the long-tailed version of CIFAR10, which is quite popular among researchers working with long-tailed datasets.
In this chapter, the ideas will be very similar to what we learned in Chapter 5, Cost-Sensitive Learning, where the high-level idea was to increase the weight of the positive (minority) class and decrease the weight of the negative (majority) class in the cost function of the model. To facilitate this adjustment to the loss function, frameworks such as scikit-learn
and XGBoost offer specific parameters. scikit-learn
provides options such as class_weight
and sample_weight
, while XGBoost offers scale_pos_weight
as a parameter...