Technical requirements
The Python notebooks for this chapter are available on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Imbalanced-Data/tree/master/chapter04. As usual, you can open the GitHub notebook using Google Colab by clicking on the Open in Colab icon at the top of this chapter’s notebook or by launching it from https://colab.research.google.com using the GitHub URL of the notebook.
In this chapter, we will continue to use a synthetic dataset generated using the make_classification
API, just as we did in the previous chapters. Toward the end of this chapter, we will test the methods we learned in this chapter on some real datasets. Our full dataset contains 90,000 examples with a 1:99 imbalance ratio. Here is what the training dataset looks like:
Figure 4.2 – Plot of a dataset with a 1:99 imbalance ratio
With our imbalanced dataset ready to use, let’s look at the first ensembling method, called bagging...