We will now make our first foray into probabilistic models and machine learning with text. We did, of course, come across such models earlier on (in Chapter 5, POS-Tagging and Its Applications, Chapter 6, NER-Tagging and Its Applications, and Chapter 7, Dependency Parsing), especially in the way we trained our NER and POS taggers, but our goal in the previous chapters was not to come up with a statistical model involving our text data.
What is a topic model? As the name might suggest, it is a probabilistic model which contains information about topics in the text. We now must ask what exactly a topic is - we can understand a topic as a theme, or underlying ideas represented in text. For example, if we are working with a corpus of newspaper articles, possible topics would be weather, politics, sport, and so on.
Why would such topic models be important in...