In this chapter, we will attempt to classify fraudulent transactions in a dataset concerning credit card transactions from European card holders that occurred during September 2013. The main problem in this dataset is the extremely small number of fraudulent transactions, compared to the dataset's size. These types of datasets are called unbalanced, as there are unequal percentages of each label. We will try to create ensembles that can classify our particular dataset, which contains a small number of fraudulent transactions.
In this chapter we will cover the following topics:
- Getting familiar with the dataset
- Exploratory analysis
- Voting
- Stacking
- Bagging
- Boosting
- Using random forests
- Comparative analysis of ensembles