From LDA to Ensemble LDA
Suppose a corpus has three distinct words, and the three words belong to three topics. This idea is shown in Figure 13.1, in which the vertices of the simplex are the three words. The three topics are labeled as Topic A, Topic B, and Topic C in the left simplex. However, LDA may identify a fourth topic from the combination of the three topics. It is a “pseudo” topic, as shown in the middle of the simplex.
Figure 13.1 – Applying the ensembling method to LDA
Let’s take an ensembling approach by building many LDA models on the same data. Most of the LDA models will have topics A, B, and C, and some other LDA models will produce pseudo topics in addition to topics A, B, and C. The “true” topics A, B, and C shall appear more frequently and the “pseudo” topics shall appear less frequently. This idea is demonstrated in the simplex on the right-hand side of Figure 14.1. All the blue...