Oversampling Methods
In machine learning, we often don’t have enough samples of the minority class. One solution might be to gather more samples of such a class. For example, in the problem of detecting whether a patient has cancer or not, if we don’t have enough samples of the cancer class, we can wait for some time to gather more samples. However, such a strategy is not always feasible or sensible and can be time-consuming. In such cases, we can augment our data by using various techniques. One such technique is oversampling.
In this chapter, we will introduce the concept of oversampling, discuss when to use it, and the various techniques to perform it. We will also demonstrate how to utilize these techniques through the imbalanced-learn
library APIs and compare their performance using some classical machine learning models. Finally, we will conclude with some practical advice on which techniques tend to work best under specific real-world conditions.
In this...