Understanding topic modeling
A topic model is a statistical model of the topics in a document. The assumption is that if 10 percent of a document talks about the military and 40 percent of it talks about the economy (and 50 percent talks about other things), then there should be roughly four times as many words about economics as about the military.
An early form of topic modeling was described by Christos Papadimitriou and others in their 1998 paper, Latent Semantic Indexing: A probabilistic analysis (http://www.cs.berkeley.edu/~christos/ir.ps). This was refined by Thomas Hofmann in 1999 with Probabilistic Latent Semantic Indexing (http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf).
In 2003, David Blei, Andrew Ng, and Michael I. Jordan published their paper, Latent Dirichlet Allocation (http://jmlr.csail.mit.edu/papers/v3/blei03a.html). Currently, this is the most common type of topic modeling. It's simple, easy to get started, and widely available. Most work in the field since then...