Generally in a fraud dataset, we have sufficient data for the negative class (non-fraud/genuine transactions) and very few or no data for the positive class (fraudulent transactions). This is termed a class imbalance problem in the ML world. We train an AE on the non-fraud data and learn features using the encoder. The decoder is then used to compute the reconstruction error on the training set to find a threshold. This threshold will be used on the unseen data (test dataset or otherwise). We use the threshold to identify those test instances whose values are greater than the threshold as fraud instances.
For the project in this chapter, we will be using a dataset that is sourced from this URL: https://essentials.togaware.com/data/. This is a public dataset of credit card transactions. This dataset is originally made available through the research...