Methods of attrition prediction
In the previous section, we described our use case of predicting student attrition and also prepared our Spark computing platform. In this section, we need to perform the task of mapping our use case to machine learning methods, which is to select our analytical methods or predictive models (equations) for this attrition prediction project.
To model and predict student attrition, the most suitable models include logistic regression and decision tree, as both of them yield good results. Some researchers use neural network and SVM models, but the results are no better than logistic regression. Therefore, for this exercise, we will focus our efforts on logistic regression and decision trees, as well as random forest as an extension of decision tree, and then use model evaluation to determine which one is the best.
As always, once we finalize our decision regarding analytical methods or models, we need to prepare for coding.
Regression models
Regression was used in...