Coding with Gensim
Gensim already builds in the common preprocessing tasks including tokenization, stemming, and stop word removal in its preprocess_string()
function. I am going to show you the preprocessing function. I’ll also separately demonstrate Gensim’s stop word removal and stemming tasks.
Gensim for preprocessing
Gensim’s preprocess_string()
class is an effective and powerful class that can perform all text preprocessing tasks such as stop-word removal, tagging, punctuation removal, and stemming. Let’s see the code:
from gensim.parsing.preprocessing import remove_stopwords, preprocess_stringremove_stopwords(text) preprocess_string(text)
The output is as follows:
['economi', 'look', 'solid', 'late', 'feder', 'reserv', 'offici', 'probabl', 'doubl', 'project', 'growth', 'year']
The preprocess_string()
function performs...