Performance measurements
So is all this effort worth it? Let's take a look at some performance comparisons between a local R script, MLlib, and Apache SystemML:
The ALS algorithm has been run on different Datasets with 1.2, 12, and 120 GB size using R, MLlib, and ApacheSystemML. We can clearly see that, even on the smallest Dataset, R is not a feasible solution as it took more than 24 hours, and we are not sure if it would have ever completed. On the 12 GB Dataset, we've noticed that ApacheSystemML runs significantly faster than MLlib, and finally, on the 120 GB Dataset, the ALS implementation of MLlib didn't finish in one day and we gave up.