Chapter 5. Risk Scoring on Spark
Starting with this chapter, we will go deep into some technologies for using Apache Spark for machine learning. While this chapter focuses on notebooks for Spark, Chapter 6, Churn Prediction on Spark will focus on machine learning libraries including MLlib, and Chapter 7, Recommendations on Spark, will focus on SPSS with Spark.
Specifically, in this chapter, we will review machine learning methods and analytical processes for a risk scoring project, and get them implemented by R notebooks on Apache Spark in a special DataScientistWorkbench environment. We will also discuss how Apache Spark notebooks help to get everything well-organized and easy. The following topics will be covered in this chapter:
- Spark for risk scoring
- Methods for risk scoring
- Data and feature preparation
- Model estimation
- Model evaluation
- Results explanation
- Deployment of risk scoring