Modeling with Gensim
The Gensim’s LSI module implements fast, truncated SVD. The module is actually engineered to process very large corpora through distributed computing. It can also take a stream of corpora as input. Let’s use the LsiModel()
function to build our first model. We will build it with the corpus in BoW and TF-IDF.
BoW
Let’s start with the model with the corpus in BoW:
import gensimlsi_model = gensim.models.lsimodel.LsiModel( corpus=bow_corpus, id2word=gensim_dictionary, num_topics=20)
Let’s explain the previous parameters:
corpus
: This is the corpus of our data. It is a required input parameter. Here we usebow_corpus
.id2word
: This is a list of the IDs of the words. We assign our dictionary asgensim_dictionary
. If we do not assign a dictionary, Gensim will show a warning message but...