In this section, we will show how to develop a model-based book recommendation system with the Spark MLlib library. Books and the corresponding ratings were downloaded from this link: http://www2.informatik.uni-freiburg.de/~cziegler/BX/. There are three CSV files:
-
BX-Users.csv: Contains user's demographic data and each user is specified with user IDs (User-ID).
-
BX-Books.csv: Book related information such as Book-Title, Book-Author, Year-Of-Publication, and Publisher are there. Each book is identified by an ISBN. Also, Image-URL-S, Image-URL-M, and Image-URL-L are given.
-
BX-Book-Ratings.csv: Contains the rating specified by the Book-Rating column. Ratings are on a scale from 1 to 10 (higher values denoting higher appreciation), or implicit, expressed by 0.
Before we jump into the coding part, we need to know a bit more about...