Combining similar words – lemmatization
A similar technique to stemming is lemmatization. The difference is that lemmatization provides us with a real word, that is, its canonical form. For example, the lemma of the word cats is cat, and the lemma for the word ran is run.
Getting ready
We will be using the NLTK package for this recipe.
How to do it…
The NLTK package includes a lemmatizer
module based on the WordNet database.
Here is how to use it:
- Import the NLTK WordNet
lemmatizer
:from nltk.stem import WordNetLemmatizer
- Initialize
lemmatizer
:lemmatizer = WordNetLemmatizer()
- Initialize a list with words to lemmatize:
words = ['duck', 'geese', 'cats', 'books']
- Lemmatize the words:
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
- The result will be as follows:
['duck', 'goose', 'cat', 'book']
How it works…
In step 1, we import...