Lemmatizing words with WordNet
Lemmatization is very similar to stemming, but is more akin to synonym replacement. A lemma is a root word, as opposed to the root stem. So unlike stemming, you are always left with a valid word that means the same thing. However, the word you end up with can be completely different. A few examples will explain this.
Getting ready
Make sure that you have unzipped the wordnet
corpus in nltk_data/corpora/wordnet
. This will allow the WordNetLemmatizer
class to access WordNet. You should also be familiar with the part-of-speech tags covered in the Looking up Synsets for a word in WordNet recipe of Chapter 1, Tokenizing Text and WordNet Basics.
How to do it...
We will use the WordNetLemmatizer
class to find lemmas:
>>> from nltk.stem import WordNetLemmatizer >>> lemmatizer = WordNetLemmatizer() >>> lemmatizer.lemmatize('cooking') 'cooking' >>> lemmatizer.lemmatize('cooking', pos='v') 'cook' >>> lemmatizer.lemmatize('cookbooks...