Summary
In this chapter, we learned about the structure of LDA and its mathematical process to achieve results. LDA assumes a document is generated from a distribution of topics and that each topic has a distinctive distribution of words. Both the distribution of topics and the distribution of words are latent. By using the generative modeling method, which generates data from the assumed distributions to fit the observable data, LDA can discover the hidden distributions. We learned about the variational E-M algorithm for generative modeling to solve the parameters of hidden distributions. We compared the advantages of two algorithms – variational E-M and Gibbs sampling – that are typically used in solving the optimization problem of LDA.
In the next chapter, we will develop Python code in Gensim to conduct LDA topic modeling for real-world data. We will make technical decisions, such as how to determine the optimum number of topics and how to use the model to score...