Converting words to their base forms using lemmatization
Lemmatization is another method of reducing words to their base forms. In the previous section, we saw that some of the base forms that were obtained from those stemmers didn't make sense. Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is like stemming, but it brings context to the words. So, it links words with similar meanings to one word. For example, all three stemmers said that the base form of calves is calv, which is not a real word. Lemmatization takes a more structured approach to solve this problem. Here are some more examples of lemmatization:
- rocks : rock
- corpora : corpus
- worse : bad
The lemmatization process uses the lexical and morphological analysis of words. It obtains the base forms by removing the inflectional word endings such as ing or ed. This base form of any word is known...