In some specific scenarios, the dataset can be structured like a matrix, where the rows represent a category and the columns represent another category. For example, let's suppose we have a set of feature vectors representing the preference (or rating) that a user expressed for a group of items. In this example, we can randomly create such a matrix, forcing 50% of ratings to be null (this is realistic considering that a user never rates all possible items):
import numpy as np
nb_users = 100
nb_products = 150
max_rating = 10
up_matrix = np.random.randint(0, max_rating + 1, size=(nb_users, nb_products))
mask_matrix = np.random.randint(0, 2, size=(nb_users, nb_products))
up_matrix *= mask_matrix
In this case, we are assuming that 0 means that no rating has been provided, while a value bounded between 1 and 10 is an actual rating. The resulting matrix is shown in the following...