Looking at an example
To illustrate some of these concepts, we’ll work through an example using JupyterLab where we explore an SA task for movie reviews. We’ll look at how we can apply the NLTK and spaCy packages to get some ideas about what the data is like, which will help us plan further processing.
The corpus (or dataset) that we’ll be looking at is a popular set of 2,000 movie reviews, classified as to whether the writer expressed a positive or negative sentiment about the movie (http://www.cs.cornell.edu/people/pabo/movie-review-data/).
Dataset citation
Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL, 2005.
This is a good example of the task of SA, which was introduced in Chapter 1.
Setting up JupyterLab
We’ll be working with JupyterLab, so let’s start it up. As we saw earlier, you can start JupyterLab by simply typing the...