Data analysis
Introduction to topic models
As per Wikipedia, a topic model is defined as follows :
"In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body."
Topic models are essentially iterative algorithms that work with document feature matrices, to use overlapping features to group documents together. Features could simply be all the words in a sentence, or selected features such as nouns or named entities, and so on. To explain in a simplistic manner, we imagine that we have a corpus of documents of mixed subjects and we use words as features to represent a document. If we had to analyse these documents using topic models, and the topic model would group words like "team", "match", "game", and "score" in a single topic (as these word frequently appear together...