Exploring the Book-Crossing dataset
In this section, we will perform exploratory data analysis on a new dataset and visualize its main characteristics.
The Book-Crossing
dataset [1] is a collection of book ratings provided by 278,858 users in the BookCrossing community (www.bookcrossing.com). The ratings, which are both explicit (rating between 1 and 10) and implicit (users interacted with the book), total 1,149,780 and pertain to 271,379 books. The dataset was collected by Cai-Nicolas Ziegler during a four-week crawl in August and September 2004. We will use the Book-Crossing
dataset to build a book recommender system in this chapter.
Let’s download the dataset and unzip it with the following commands:
from io import BytesIO from urllib.request import urlopen from zipfile import ZipFile url = 'http://www2.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip' with urlopen(url) as zurl: with ZipFile(BytesIO(zurl.read())) as zfile...