Latent Dirichlet Allocation
The last model we want to present in this book is called the Latent Dirichlet Allocation. It is a generative model that can be represented as a graphical model. It's based on the same idea as the mixture model, with one notable exception. In this model, we assume that the data points might be generated by a combination of clusters and not just one cluster at a time, as was the case before.
The LDA model is primarily used in text analysis and classification. Let's consider that a text document is composed of words making sentences and paragraphs. To simplify the problem, we can say that each sentence or paragraph is about one specific topic, such as science, animals, sports, and so on. Topics can also be more specific, such as cats or European soccer. Therefore, there are words that are more likely to come from specific topics. For example, the word cat is likely to come from the topic cats. The word stadium is likely to come from the topic European soccer. However...