Model performance comparison of various oversampling methods
Let’s examine how some popular models perform with the different oversampling techniques we’ve discussed. We’ll use two datasets for this comparison: one synthetic and one real-world dataset. We’ll evaluate the performance of four oversampling techniques, as well as no sampling, using logistic regression and random forest models.
You can find all the related code in this book’s GitHub repository. In Figure 2.15 and Figure 2.16, we can see the average precision score values for both models on the two datasets:
Figure 2.15 – Performance comparison of various oversampling techniques on a synthetic dataset
Figure 2.16 – Performance comparison of various oversampling techniques on the thyroid_sick dataset
Based on these plots, we can draw some useful conclusions:
- Effectiveness of oversampling: In general, using...