Removing stopwords
When we work with words, especially if we are considering the words’ semantics, we sometimes need to exclude some very frequent words that do not bring any substantial meaning into the sentence (words such as but, can, we, etc.). For example, if we want to get a rough sense of the topic of a text, we could count its most frequent words. However, in any text, the most frequent words will be stopwords, so we want to remove them before processing. This recipe shows how to do that. The stopwords list we are using in this recipe comes from the NLTK package and might not include all the words you need. You will need to modify the list accordingly.
Getting ready
We will remove stopwords using spaCy and NLTK; these packages are part of the Poetry environment that we installed earlier.
We will be using the Sherlock Holmes text referred to earlier. For this recipe, we will need just the beginning of the book, which can be found in the file at https://github...