Loading and pre-processing the data
Our first goal in building our recommender systems is to load the data in R, preprocess it, and convert it into a rating matrix. More precisely, in each case, we will be creating a realRatingMatrix
object, which is the specific data structure that the recommenderlab
package uses to store numerical ratings. We will start with the jester datasets. If we download and unzip the archive from the website, we'll see that the file jesterfinal151cols.csv
contains the ratings. More specifically, each row in this file corresponds to the ratings made by a particular user, and each column corresponds to a particular joke.
The columns are comma-separated and there is no header row. In fact, the format is almost already a rating matrix, were it not for the fact that the first column is a special column and contains the total number of ratings made by a particular user. We will load this data into a data table using the function fread()
, which is a fast implementation...