Building a fraud detection example
This section will show you how to build a simple fraud detection example using real sanitized credit card data available on Kaggle. The transactions occurred in September 2013 and there are 492 frauds out of 284,807 transactions, which is unbalanced because the number of frauds is a little low for training a model. The data has been transformed by Principal Component Analysis (PCA) using the techniques demonstrated in the Relying on Principle Component Analysis section of Chapter 6, Detecting and Analyzing Anomalies. Only the Amount
column has the original value in it. The Class
column has been added to label the data. You can also find the source code for this example in the MLSec; 08; Perform Fraud Detection.ipynb
file of the downloadable source.
Getting the data
The dataset used in this example appears at https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?resource=download. The data is in a 69 MB .zip
file. Download the file manually...