Summary
In this chapter, we learned about imbalanced datasets and strategies for addressing imbalanced datasets. We introduced the use cases where imbalanced datasets would be encountered. We looked at the challenges posed by imbalanced datasets and we were introduced to the metrics that should be used in the case of an imbalanced dataset. We formulated strategies for dealing with imbalanced datasets and implemented different strategies, such as random undersampling and oversampling, for balancing datasets. We then fit different models after balancing the datasets and analyzed the results.
Balancing datasets is a very effective way to improve the performance of your classifiers. However, it should be noted that there could be a degradation of overall accuracy measures for the majority class due to balancing. What strategies to adopt in what situations should be arrived at based on the problem statement and also after rigorous experiments for those problem statements.
Having learned...