Summary
In this chapter, we learned to build LDA models with BoW data and TF-IDF data. We compared the LDA model outcomes between BoW and TF-IDF data. We then built more LDA models with a range of topics. We learned what the coherence score is and how to use it to determine the optimal number of topics. We also learned the prediction outcome for a document is a list of 2-tuples for topic IDs and topic probabilities. The LDA model presents a document as a distribution of topics and each topic as a distribution of words. Such rich content in the results incurs another challenge: what is the best way to visualize the rich content?
In the next chapter, we will learn how to design a visual tool to deliver rich content and how to communicate the content.