Profiling the code
Profiling is about identifying parts of the code that are slow or use a lot of memory. We will profile a modified version of the sentiment.py
code from Chapter 9, Analyzing Textual Data and Social Media. The code is refactored to comply with multiprocessing programming guidelines. You will learn about multiprocessing later in this chapter. Also, we simplified the stopwords filtering. The third change is to have fewer word features as the reduction doesn't impact accuracy. This last change has the most impact. The original code ran for about 20 seconds. The new code runs faster than that and will serve as the baseline in this chapter. Some changes have to do with profiling and will be explained later in this section. Please refer to the prof_demo.py
file in this book's code bundle:
import random from nltk.corpus import movie_reviews from nltk.corpus import stopwords from nltk import FreqDist from nltk import NaiveBayesClassifier from nltk.classify import accuracy...