Stop words are common words that, in a natural language processing situation, do not provide much contextual meaning. These words are often the most common words in a language. These tend to, at least in English, be articles and pronouns, such as I, me, the, is, which, who, at, among others. Processing of meaning in documents can often be facilitated by removal of these words before processing, and hence many tools support this ability. NLTK is one of these, and comes with support for stop word removal for roughly 22 languages.
Determining and removing stop words
How to do it
Proceed with the recipe as follows (code is available in 07/06_freq_dist.py):
- The following demonstrates stop word removal using NLTK. First,...