Analyzing the ALS recommendations results
Open up Command Prompt
and type spark-submit movie-recommendations-als.py
with user 0
:
That user 0
is my Star Wars fan that doesn't like Gone With The Wind.
Off it goes, using all the cores that I have. It should finish quite quickly. For such a fancy algorithm, that came back creepily fast, almost suspiciously so:
So, for my fictitious user who loves Star Wars and The Empire Strikes Back, but hated Gone With The Wind, the number one recommendations it produced was something called Love in the Afternoon and Roommates. What? What is this stuff? That's crazy. Lost in Space, okay, I can go with that, but the rest of this just doesn't make sense. What's worse is if I run it again, I'll actually get different results! Now, it could be that the algorithm is taking some shortcuts and randomly sampling things to save time, but even so, that's not good news.
Let's see what we get if we run it again. We get a totally different set of results:
There's something...