The Ensemble LDA for Model Stability
One of the success criteria of topic modeling is to produce a reliable set of topics. However, many experiments with Latent Dirichlet Allocation (LDA) have shown that the topics can be unstable and not reproducible. This issue seriously limits the applications of LDA. The instability of the topic results is partly due to the fact that the model settles at a local maximum depending on the random initialization. Even if a seed number is set to control random initialization, noisy topics can be generated during the modeling process, which might influence the quality of the outcome.
The root cause of the instability is that a single LDA model identifies the “true” topics and “pseudo” topics and produces noisy predictions. If the model is trained again, it will identify “true” topics and other “pseudo” topics. The solution is to build multiple models or an ensemble of models to weed out the pseudo...