This chapter introduced several important concepts including data cleanup and handling missing and categorical values, using Spark and H2O to train multi-classification models, and various evaluation metrics for classification models. Furthermore, the chapter brings the notion of model ensembles demonstrated on RandomForest as the ensemble of decision trees.
The reader should see the importance of data preparation, which plays a key role during every model training and evaluation process. Training and using a model without understanding the modeling context can lead to misleading decisions. Moreover, every model needs evaluation with respect to the modeling goal (for example, minimization of false positives). Hence understanding trade-offs of different model metrics of classification models is crucial.
In this chapter, we did not cover all possible modelling tricks for...