Methods of risk scoring
Having described our business use case, and prepared our Apache Spark computing platform, in this section, we need to select our analytical methods or predictive models (equations) for this machine learning project for risk scoring, which is to complete a task of mapping our risk modelling case to machine learning methods.
To model and predict loan defaults, logistic regression and decision tree are among the most utilized methods. For our exercise, we will use both. But we will focus on logistic regression, because logistic regression, if well developed in combination with decision trees, can outperform most of the other methods.
As always, once we finalize our decision for analytical methods or models, we will need to prepare our coding, which will be in R for this chapter.
Logistic regression
Logistic regression measures the relationship between one categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function...