The dataset was originally utilized in the PhD thesis of Andrea Dal Pozzolo, Adaptive Machine learning for credit card fraud detection ULB MLG, and has since been released by its authors for public use (www.ulb.ac.be/di/map/adalpozz/data/creditcard.Rdata). The dataset contains more than 284,000 instances, but only 492 instances of fraud (almost 0.17%).
Its target class value is 0 if the transaction was not a fraud, and 1 if it was. The dataset's features are a number of principal components, as the dataset has been transformed using Principle Components Analysis (PCA), in order to retain the confidentiality of the data. The dataset's features are comprised of 28 PCA components, as well as the transaction’s amount and the time elapsed from the first transaction in the dataset. Descriptive statistics about the dataset are provided...