Factor Analysis
Let's suppose we have a Gaussian data-generating process , and M n-dimensional zero-centered samples drawn from it:
If pdata has a mean , it's also possible to use this model, but it's necessary to account for this non-null value with slight changes to some of the formulas. As the zero-centering normally has no drawbacks, it's easier to remove the mean to simplify the model.
One of the most common problems in unsupervised learning is finding a lower dimensional distribution plower such that the Kullback-Leibler divergence with pdata is minimized. When performing a factor analysis (FA), following the original proposal published in Rubin D., Thayer D., EM algorithms for ML factor analysis, Psychometrika, 47/1982, Issue 1, and Ghahramani Z., Hinton G. E., The EM algorithm for Mixtures of Factor Analyzers, CRC-TG-96-1, 05/1996, we start from the assumption that we can model the generic data point as a linear combination of Gaussian latent...