Experimenting with LDA modeling
The BoW data and TF-IDF data are two variants of data formation. The model built on BoW data will result in a different outcome from the model built on TF-IDF. We will build models on both variants.
A model built on BoW data
The basic syntax of LDA is easy. The required input parameter is corpus
. We assign the BoW data to build the first model, as illustrated in the following code snippet:
from gensim.models import LdaModellda_bow = LdaModel(bow_corpus, num_topics=10, id2word = gensim_dictionary)
I’d like to review several important model inputs, as follows:
num_topics
: This is the number of topics. In this experiment, we will just assign10
. We will learn how to determine the optimal number of topics in the Determining the optimal number of topics section.random_state=None
: This is helpful for reproducibility.id2word
: We assign ourgensim_dictionary
dictionary from our corpus. If we do not assign a dictionary, Gensim...