Once we have extracted these simple features from our raw data, we are ready to proceed with model training; ML takes care of this for us. All we have to do is provide the correctly-parsed input dataset we just created as well as our chosen model parameters.
Split the dataset in to training and testing sets with ratio 80:20, as shown in the following lines of code:
def createALSModel() {
val ratings = FeatureExtraction.getFeatures();
val Array(training, test) = ratings.randomSplit(Array(0.8, 0.2))
println(training.first())
}
You can find the code listing at: https://github.com/ml-resources/spark-ml/blob/branch-ed2/Chapter_05/2.0.0/scala-spark-app/src/main/scala/com/spark/recommendation/ALSModeling.scala
You will see the following output:
16/09/07 13:23:28 INFO Executor: Finished task 0.0 in stage 1.0 (TID
1). 1768 bytes result sent to driver
16/09/07 13:23:28 INFO TaskSetManager...