CBF methods
This class of method relies on the data that describes the items, which is then used to extract the features of the users. In our MovieLens example, each movie j has a set of G binary fields to indicate if it belongs to one of the following genres: unknown, action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy, film noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western.
Based on these features (genres), each movie is described by a binary vector mj with G dimensions (number of movie genres) with entries equal to 1
for all the genres contained in movie j, or 0
otherwise. Given the dataframe
that stores the utility matrix called dfout
in the Utility matrix section mentioned earlier, these binary vectors mj are collected from the MoviesLens database
into a dataframe using the following script:
The movies content matrix has been saved in the movies_content.csv
file ready to be used by the CBF methods.
The goal of the content-based...