Cost-Sensitive Learning
So far, we have studied various sampling techniques and ways to oversample or undersample data. However, both of these techniques have their own unique set of issues. For example, oversampling can easily lead to overfitting of the model due to the exact or very similar examples being seen repeatedly. Similarly, with undersampling, we lose some information (that could have been useful for the model) because we discard the majority class examples to balance the training dataset. In this chapter, we’ll consider an alternative to the data-level techniques that we learned about previously.
Cost-sensitive learning is an effective strategy to tackle imbalanced data. We will go through this technique and learn why it can be useful. This will help us understand some of the details of cost functions and how machine learning models are not designed to deal with imbalanced datasets by default. While machine learning models aren’t equipped to handle imbalanced...