Strategies for removing easy observations
The reverse of the strategy to remove the rich and famous Capulets is to remove the poor and weak Capulets. This section will discuss the techniques for removing the majority samples far away from the minority samples. Instead of removing the samples from the boundary between the two classes, we use them for training a model. This way, we can train a model to better discriminate between the classes. However, one downside is that these algorithms risk retaining noisy data points, which could then be used to train the model, potentially introducing noise into the predictive system.
Condensed Nearest Neighbors
Condensed Nearest Neighbors (CNNeighbors) [11] is an algorithm that works as follows:
- We add all minority samples to a set and one randomly selected majority sample. Let’s call this set
C
. - We train a KNN model with k = 1 on set
C
. - Now, we repeat the following four steps for each of the remaining majority samples...