Converting words to their base forms using stemming
Working with text has a lot of variations included in it. We have to deal with different forms of the same word and enable the computer to understand that these different words have the same base form. For example, the word sing can appear in many forms such as sang, singer, singing, singer, and so on. We just saw a set of words with similar meanings. Humans can easily identify these base forms and derive context.
When we analyze text, it's useful to extract these base forms. It will enable us to extract useful statistics to analyze the input text. Stemming is one way to achieve this. The goal of a stemmer is to reduce words in their different forms into a common base form. It is basically a heuristic process that cuts off the ends of words to extract their base forms. Let's see how to do it using NLTK.
Create a new python file and import the following packages:
from nltk.stem.porter import PorterStemmer from nltk.stem.lancaster...