Embedding for supervised and unsupervised fraud detection
In this section, we will describe how the bipartite and tripartite graphs described previously can be used by graph machine learning algorithms to build automatic procedures for fraud detection using supervised and unsupervised approaches. As we already discussed at the beginning of this chapter, transactions are represented by edges, and we then want to classify each edge in the correct class: fraudulent or genuine.
The pipeline we will use to perform the classification task is the following:
- A sampling procedure for the imbalanced task
- The use of an unsupervised embedding algorithm to create a feature vector for each edge
- The application of supervised and unsupervised machine learning algorithms to the feature space defined in the previous point
Supervised approach to fraudulent transaction identification
Since our dataset is strongly imbalanced, with fraudulent transactions representing 2.83%...