SMOTE variants
Now, let’s look at some of the SMOTE variants, such as Borderline-SMOTE, SMOTE-NC, and SMOTEN. These variants apply the SMOTE algorithm to samples of a certain kind and may not always be applicable.
Borderline-SMOTE
Borderline-SMOTE [4] is a variation of SMOTE that generates synthetic samples from the minority class samples that are near the classification boundary, which divides the majority class from the minority class.
Why consider samples on the classification boundary?
The idea is that the examples near the classification boundary are more prone to misclassification than those far away from the decision boundary. Producing more such minority samples along the boundary would help the model learn better about the minority class. Intuitively, it is also true that the points away from the classification boundary likely won’t make the model a better classifier.
Here’s a step-by-step algorithm for Borderline-SMOTE:
- We run a...