Sentiment analysis with spaCy
In this section, we'll work on a real-world dataset and train spaCy's TextCategorizer
on this dataset. We'll be working on the Amazon Fine Food Reviews dataset (https://www.kaggle.com/snap/amazon-fine-food-reviews) from Kaggle in this chapter. The original dataset is huge, with 100,000 rows. We sampled 4,000 rows. This dataset contains customer reviews about fine food sold on Amazon. Reviews include user and product information, user rating, and text.
You can download the dataset from the book's GitHub repository. Type the following command into your terminal:
wget https://github.com/PacktPublishing/Mastering-spaCy/blob/main/Chapter08/data/Reviews.zip
Alternatively, you can click on the URL in the preceding command and the download will start. You can unzip the zip file with the following:
unzip Reviews.zip
Alternatively, you can right-click on the ZIP file and choose Extract here to inflate the ZIP file....