In this section, we will use explicit rating data, without additional user, item metadata, or other information related to the user-item interactions. Hence, the features that we need as inputs are simply the user IDs, movie IDs, and the ratings assigned to each user and movie pair.
Extracting the right features from your data
Extracting features from the MovieLens 100k dataset
In this example, we will use the same MovieLens dataset that we used in the previous chapter. Use the directory in which you placed the MovieLens 100k dataset as the input path in the following code.
First, let's inspect the raw ratings dataset:
object FeatureExtraction {
def getFeatures(): Dataset[FeatureExtraction.Rating] = {
val spark = SparkSession.builder.master("local[2]...