Chapter 6: Topic Modeling
In this chapter, we will cover topic modeling, or the unsupervised discovery of topics present in a corpus of text. There are many different algorithms available to do this, and we will cover four of them: Latent Dirichlet Allocation (LDA) using two different packages, non-negative matrix factorization, K-means with Bidirectional Encoder Representations from Transformers (BERT) embeddings, and Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM) for topic modeling of short texts, such as sentences or tweets.
The recipe list is as follows:
- LDA topic modeling with sklearn
- LDA topic modeling with gensim
- NMF topic modeling
- K-means topic modeling with BERT
- Topic modeling of short texts