Guidance for using various oversampling techniques
Now, let’s review some guidelines on how to navigate through the various oversampling techniques we went over and how these techniques differ from each other:
- Train a model without applying any sampling techniques. This will be our model with baseline performance. Any oversampling technique we apply is expected to give a boost to this performance.
- Start with random oversampling and add some shrinkage too. We may have to play with some values of shrinkage to see if the model’s performance improves.
- When we have categorical features, we have a couple of options:
- Convert all categorical features into numerical features first using one-hot encoding, label encoding, feature hashing, or other feature transformation techniques.
- (Only for nominal categorical features) Use SMOTENC and SMOTEN directly on the data.
- Apply various oversampling techniques – random oversampling, SMOTE, Borderline-SMOTE, and...