Understanding the idea behind LDA
LDA assumes each topic has a distinctive distribution of words. It analyzes the frequency of words to discover hidden topics and identifies the probability of a word belonging to a topic. Its generative modeling approach is a unique feature. LDA considers hidden topics as templates in a printing shop. Each topic template has a set of words. An article is generated from a topic template or a mixture of topic templates. This approach is even described in its title. It contains the word latent because it finds the hidden topics in the latent space. The word Dirichlet refers to the assumption that both the distribution of topics in a document and the distribution of words in a topic follow Dirichlet distributions. Allocation means the mixture of topics and words is generated from the topic templates and allocated to a document.
Typically, a document has some portion of text on a topic and some on other topics. LDA tags a document into several topics...