Model estimation
Once the feature sets get finalized, in our last section, what follows is the estimating of parameters of the selected models, for which we can use MLlib on the Zeppelin notebook.
Similar to what we did before, for the best modeling, we need to arrange distributed computing, especially for this case, with various student segments for various study subjects. For this distributed computing part, readers may refer to previous chapters as we will not repeat them here.
Spark implementation with the Zeppelin notebook
With MLlib for SCALA code for random forest, we will use the following code:
// Train a RandomForest model. val treeStrategy = Strategy.defaultStrategy("Classification") val numTrees = 300 val featureSubsetStrategy = "auto" // Let the algorithm choose. val model = RandomForest.trainClassifier(trainingData, treeStrategy, numTrees, featureSubsetStrategy, seed = 12345)
For decision tree, we will execute the following code:
val model = DecisionTree.trainClassifier(trainingData...