Hyper parameters
We have glossed over an important aspect: model tuning. As you can see, there are many parameters that can be tuned, depending on the algorithm. And we have been setting the parameters once. For example, in the case of the recommender, we set rank=12
, regularizationParameter=0.1
, and maxIterations=20
. In reality, the rank could be 8 or 12; the regularization parameter 0.1,1.0, or 10; and the iterations 10 or 20. So now we need to try 12 runs with these different values, calculate the accuracy, and then select the one with the best value. This is a simple case; we might have more than 100 runs and many parameters. This is where cross validation comes into the picture. To keep this book within its boundaries, I will leave this part for you to explore. Two places to go are the documentation for org.apache.spark.ml.tuning
class and the examples code at https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples/ml.