Understanding the data
Understanding your data is critical to all data-related work. In this recipe, we will acquire and take a first look at the data that we will be using to build our recommendation engine.
Getting ready
To prepare for this recipe, and the rest of the chapter, download the MovieLens data from the GroupLens website of the University of Minnesota. You can find the data at http://grouplens.org/datasets/movielens/ .
In this chapter, we will use the smaller MoveLens 100k dataset (4.7 MB in size) in order to load the entire model into the memory with ease.
How to do it...
Perform the following steps to better understand the data that we will be working with throughout this chapter:
- Download the data from
http://grouplens.org/datasets/movielens/
. The 100K dataset is the one that you want (
ml-100k.zip
):
- Unzip the downloaded data into the directory of your choice.
- The two files that we are mainly concerned with are
u.data
, which contains the user movie ratings, andu.item
, which contains...