Gensim isn't the only package offering us the ability to topic model: scikit-learn, while not dedicated for text, still offers fast implementations of LDA and Non-negative Matrix Factorization (NMF), which can help us identify topics.
We already discussed how LDA works, and the only difference between the Gensim and scikit-learn implementations are as follows:
- The perplexity bounds are not expected to agree exactly here because the bound is calculated differently in Gensim versus sklearn. These bounds are ways we calculate how topics converge in topic modeling algorithms.
- Sklearn uses cython which creates numerical 6th decimal point differences.
Non-negative matrix factorization (NMF) [15], unlike LDA, is not a method mostly limited to text mining (though interestingly, LDA's variants also have been used in genetics and image processing...