Representing the data in a 3D chart
The first example is to apply our knowledge of grouping to find the outliers as possible candidates for research on fraud. For example, we could identify outliers with small amounts of spending and very early transaction hours across 6 consecutive days. This behavior will probably not correspond to the average amount of and typical working hours of transactions.
Kaggle credit card fraud dataset
The credit card fraud data has several columns:
- The number of seconds since the first transaction was recorded in the dataset.
- The amount expended by the cardholder.
- The
V1
toV12
columns represent encrypted data to protect the original information. These are numerical fields, and the K-means algorithm can classify these values into groups.
The only true values of the data are seconds
and amount
. The V1
to V2
fields have data alteration with encryption techniques for privacy measures.
We are going to perform statistical grouping...