Understanding the structure of LDA
LDA assumes a corpus is a collection of D documents, and each document is a mixture of k topics and the word distributions of those topics. Let’s imagine a document word “plate” that generates a document, d, along with its corresponding topics and bag of words. This printing plate is shown in Figure 10.5. Let’s read it from left to right:
Figure 10.5 – Graphical representation of LDA
First, the outer rectangle, D, represents the collection of D documents, and the inner rectangle represents each document, d. Document d is drawn from a topic distribution, θ d, that follows a Dirichlet distribution with a scaler parameter, α:
θ d ~ Dirichlet(α) Eq. (1).
The parameter of the Dirichlet distribution should be a vector, α = [ α 1, ..., α k], but because all elements of α are the same, α can be written as a...