We will now consider another kind of decomposition that is extremely helpful when working with text documents (that is, NLP). The theoretical part is not very easy, because it requires deep knowledge of probability theory and statistical learning (it can be found in the original paper Latent Dirichlet Allocation, Journal of Machine Learning Research, Blei D., Ng A., and Jordan M., 3, (2003) 993-1022); therefore, we are only going to discuss the main elements, without any mathematical references (a more compact description is also present in Machine Learning Algorithms Second Edition, Bonaccorso, G., Packt Publications, 2018). Let's consider a set of text documents, dj (called a corpus), whose atoms (or components) are the words, wi:
After collecting all of the words, we can build a dictionary:
We can also state the following...