When we have a collection of documents for which we do not clearly know the categories, topic models help us to roughly find the categorization. The model treats each document as a mixture of topics, probably with one dominating topic.
For example, let's suppose we have the following sentences:
- Eating fruits as snacks is a healthy habit
- Exercising regularly is an important part of a healthy lifestyle
- Grapefruit and oranges are citrus fruits
A topic model of these sentences may output the following:
- Topic A: 40% healthy, 20% fruits, 10% snacks
- Topic B: 20% Grapefruit, 20% oranges, 10% citrus
- Sentence 1 and 2: 80% Topic A, 20% Topic B
- Sentence 3: 100% Topic B
From the output of the model, we can guess that Topic A is about health and Topic B is about fruits. Though these topics are not known apriori, the model outputs corresponding probabilities for words...