Fraud detection
Identifying fraudulent transactions is one of the most important components of risk management. R has many functions and packages that can be used to find fraudulent transactions, including binary classification techniques such as logistic regression, decision tree, random forest, and so on. We will be again using a subset of the German Credit data available in R library. In this section, we are going to use random forest for fraud detection. Just like logistic regression, we can do basic exploratory analysis to understand the attributes. Here we are not going to do the basic exploratory analysis but will be using the labeled data to train the model using random forest, and then will try to do the prediction of fraud on validation data.
So the dataset used for the analysis will be given by executing the following code:
>data(GermanCredit) >FraudData<-GermanCredit[,1:10] > head(FraudData)
It generates a few lines of the sample data:
Figure 7.17: Sample data used...