Recommendation
Recommendation systems are one of the most visible and popular machine learning applications on the Web, from Amazon to LinkedIn to Walmart. The algorithms behind recommendations systems are very interesting. Recommendation algorithms fall into roughly five general mechanisms: knowledge-based, demographic-based, content-based, collaborative filtering (item-based or user-based), and latent factor-based. Of these, collaborative filtering is the most widely used and unfortunately very computationally intensive.
Spark implements a scalable variation, the Alternating Least Square (ALS) algorithm authored by Yehuda Koren, available at http://dl.acm.org/citation.cfm?id=1608614. It is a user-based collaborative filtering mechanism that uses the latent factors method of learning, which can scale to a large Dataset. Let's quickly use the movielens
medium Dataset to implement a recommendation using Spark. While the model development patterns follow what we have seen so far, there are...