Gensim for topic modeling
We used the Gensim library already in Chapter 7, Automatic Text Summarization for extracting keywords and summaries of text. Here we will use it for building a topic model of a collection of texts. Just as we did in earlier chapters, we will practice with a few different types of document collections and see how the results vary.
First, we will build a small test program to make sure that Gensim and LDA are installed correctly and able to generate a topic model from a collection of documents. If Gensim is not loaded into your version of Anaconda, simply run conda install gensim
in your terminal.
We begin with importing the Gensim libraries and a PrettyPrinter for formatting:
from gensim import corpora from gensim.models.ldamodel import LdaModel from gensim.parsing.preprocessing import STOPWORDS import pprint
We will need some variables to serve as ways of adjusting the model. As we learn how topic modeling works, we will tweak these values to see how the results change...