Stemming and lemmatization
In any text, it is common to find a word in multiple forms. See these, for example:
- Truck
- Trucks
- Truck's
- Trucks'
All these words have the unique root Truck
. The words in the list are called inflections.
The following is a quote from Wikipedia:
Changing a word from its inflected form to its root form is called word normalization.
In natural language processing, there are two main techniques to achieve this: stemming and lemmatization.
Stemming
While stemming, we use an algorithm to reduce the word to its stems. This is not the case for lemmatization, in which we use the language's morphological...