Utility matrix
The data used in a recommendation system is divided in two categories: the users and the items. Each user likes certain items, and the rating value rij (from 1 to 5) is the data associated with each user i and item j and represents how much the user appreciates the item. These rating values are collected in matrix, called utility matrix R, in which each row i represents the list of rated items for user i while each column j lists all the users who have rated item j. In our case, the data folder ml-100k
contains a file called u.data
(and also u.item
with the list of movie titles) that has been converted into a Pandas DataFrame (and saved into a csv, utilitymatrix.csv
) by the following script:
The output of the first two lines is as follows:
Each column name, apart from the first (which is the user id), defines the name of the movie and the ID of the movie in the MovieLens database (separated by a semicolon). The 0
values represent the missing values and we expect to have a...