Using MLlib to produce movie recommendations
Let's take a look at some code to actually run Alternating Least Squares recommendations on the MovieLens dataset. You'll see just how simple it is to do and we'll take a look at the results.
You can download the script from the download package for this book. Look for movie-recommendations-als.py
, download that into your SparkCourse
folder, and then we can play with it. This is going to require us to input a user ID that I want recommendations for. So, how do we know if recommendations are good? Since we don't personally know any of the people that are in this dataset from MovieLens, we need to create a fictitious user; we can kind of hack their data to stick it in there. So, in the ml-100k
folder, I've edited the u.data
file. What I've done here is I've added three lines to the top for user ID 0, because I happen to know that user ID 0 does not exist in this dataset:
I looked up a few movies that I'm familiar with, so I can get a little more of...