Summary
In this chapter, we explored the reasons for the instability and irreproducibility of LDA. The root cause of the instability is that a single LDA model identifies “true” topics and “pseudo” topics and produces noisy predictions. We learned how we can benefit from Ensemble LDA models that can deliver stable outcomes. Ensemble LDA involves training an ensemble of topic models and throwing out topics that do not reoccur across the ensemble. This ensembling method can differentiate “pseudo” topics from “true” topics. Ensemble LDA keeps the “true” topics as the final topics. In addition, we learned how CBDBSCAN develops clusters. We then built some Ensemble LDA models and examined them with new documents.
With the arrival of the transformer algorithm, even more topic-modeling algorithms have been developed in NLP, the most prominent of which is BERT-based topic modeling. In the next chapter, we will...