Chapter 1 – Introduction to Data Imbalance in Machine Learning
- The choice of loss function when training a model can greatly affect the performance of the model on imbalanced datasets. Some loss functions may be more sensitive to class imbalance than others. For instance, a model trained with a loss function such as cross-entropy might be heavily influenced by the majority class and perform poorly on the minority class.
- The PR curve is more informative than the ROC curve when dealing with highly skewed datasets because it focuses on the performance of the classifier on the positive (minority) class, which is often the class of interest in imbalanced datasets. The ROC curve, on the other hand, considers both the TPR and the FPR and thus might give an overly optimistic view of the model’s performance when the negative class dominates the dataset.
- Accuracy can be a misleading metric for model performance on imbalanced datasets because it does not take...