Methods for fraud detection
In the previous section, we described our business use case and also prepared our Spark computing platform as well as our datasets. In this section, we need to select our analytical methods or predictive models (equations) for this fraud detection project, which is to complete a task of mapping our business use case to machine learning methods.
For fraud detection, both supervised machine learning and unsupervised machine learning are commonly used. However, for this case, we will perform a supervised machine learning because we do have good data for our target variable of fraud and also because our practical goal is to reduce frauds while continuing business transactions.
To model and predict frauds, there are many suitable models, including logistic regression and the decision tree. Selecting one among them can sometimes become extremely difficult as it depends on the data to be used. One solution is to first run all the models and then select the best ones using...