Using the coherence score to find the optimal number of topics
The previous section arbitrarily set the number of topics as 20. Is this the optimal number of topics? To investigate this, we need to understand the “scope” of a topic. A topic can have a set of words that are loosely connected, or closely connected. The latter is a distinctive topic, but the former is not distinctive enough. In other words, the “closeness” of words in a topic is an important measure. If a topic has words that are very loosely connected, the topic may be better separated into more than one.
In order to measure the “closeness” of a topic, Röder, Both, and Hinneburg (2015) [5] proposed a metric called the coherence score. The score is defined as the average or median of pairwise word similarities, formed by the top words of a given topic. The value of a coherence score itself doesn’t have a universal meaning because it varies, based on the scoring...