In this first scenario, we assume that we have a set of users represented by m-dimensional feature vectors:
Typical features are age, gender, interests, and so on. All of them must be encoded using one of the techniques discussed in the previous chapters (for example, they can be binarized, normalized in a fixed range, or transformed into one-hot vectors). However, in general, it's useful to avoid different variances that can negatively impact the computation of the distance between neighbors.
We have a set of k items:
Let's also assume that there is a relation that associates each user with a subset of items (bought or positively reviewed), items for which an explicit action or feedback has been performed:
In a user-based system, the users are periodically clustered (normally using a k-Nearest Neighbors (k-NN) approach), and therefore considering...