Apache Spark for a recommendation engine
In this section, we will continue to demonstrate Spark's computation speed and ease of coding for a real-life project of movie recommendation, but to be completed by SPSS on Apache Spark.
SPSS is a widely used software package for statistical analysis. SPSS originally stood for Statistical Package for Social Science, but it is also used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations, data miners, and others. Long produced by SPSS Inc., it was acquired by IBM in 2009. Since then, IBM further developed it and turned it into a popular tool for data scientists and machine learning professionals. To make Spark available to SPSS users, IBM developed technologies making SPSS Spark integration easy, which will be covered in this chapter.
The use case
This project is to help movie rental company ZHO improve its movie recommendations to its customers.
The main data set contains tens of million...