Exploring the ratings data details for the recommendation system in Spark 2.0
In this recipe, we explore the data from the user/rating perspective to understand the nature and property of our data file. We will start to explore the ratings data file by parsing data into a Scala case class and generating visualization for insight. The ratings data will be used a little later to generate features for our recommendation engine. Again, we stress that the first step in any data science/machine learning exercise should be the visualization and exploration of the data.
Once again, the best way of understanding data quickly is to generate a data visualization of it, and we will use a JFreeChart scatterplot to do this. A quick look at the chart of users by ratings produced by the JFreeChart plot shows a resemblance to a multinomial distribution with outliers and an increasing sparsity when ratings are increased in magnitude.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice...