Performing exploratory data analysis
The analysis begins by loading the data from the corpus. For this task, we will utilize the steps already presented in the Performing exploratory data analysis section of Chapter 4, Extracting Sentiments from Product Reviews. Therefore, refer to this same section to inspect the Python code for the readCategories
, parseKeysValues
, and readReviews
methods that we will omit in this chapter. So, calling the first method, we extract 250000
samples from the dataset:
# Read the reviews from the data. reviews = readReviews('./data/Music.txt.gz', 250000) reviews.shape >> (250000, 10)
Next, we will perform a couple of transformations on the data to facilitate the analysis:
# Rename the columns for convenience. reviews.columns = ['productId', 'title', 'price', 'userId', 'profileName', 'helpfulness', 'score', 'time', 'summary', 'text&apos...