Converting words to their base forms using lemmatization
Lemmatization is another way of reducing words to their base forms. In the previous section, we saw that the base forms that were obtained from those stemmers didn't make sense. For example, all the three stemmers said that the base form of calves is calv, which is not a real word. Lemmatization takes a more structured approach to solve this problem.
The lemmatization process uses a vocabulary and morphological analysis of words. It obtains the base forms by removing the inflectional word endings such as ing or ed. This base form of any word is known as the lemma. If you lemmatize the word calves, you should get calf as the output. One thing to note is that the output depends on whether the word is a verb or a noun. Let's take a look at how to do this using NLTK.
Create a new python file and import the following packages:
from nltk.stem import WordNetLemmatizer
Define some input words. We will be using the same set of words that...