Case Study: Training a Recommender System in PySpark
To close this chapter, let us look at an example of how we might generate a large-scale recommendation system using dimensionality reduction. The dataset we will work with comes from a set of user transactions from an online store (Chen, Daqing, Sai Laing Sain, and Kun Guo. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management 19.3 (2012): 197-208). In this model, we will input a matrix in which the rows are users and the columns represent items in the catalog of an e-commerce site. Items purchased by a user are indicated by a 1. Our goal is to factorize this matrix into 1 x k user factors (row components) and k x 1 item factors (column components) using k components. Then, presented with a new user and their purchase history, we can predict what items they are like to buy in the future, and thus what we might...