In this section, we will go through the process of creating recommendations in Apache Spark using ALS.
Finding recommendations through Apache Spark's ALS
Data gathering and exploration
The first step is to download the data from https://sites.google.com/site/limkwanhui/datacode . We will be using the poiList-sigir17 dataset with photos taken by users at different theme park attractions (identified as points of interest by Flickr). There are two datasets we're interested in:
- The first dataset is the list of points of interest, which captures the names and other properties of each attraction:
poi_df = spark.read.csv(SRC_PATH + 'data-sigir17/poiList-sigir17',
header=True, inferSchema...