Until now, we dealt with computational linguistics algorithms and spaCy, and we understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn't get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter. Following are the topics we will cover in this chapter:
- What are topic models?
- Topic models in Gensim
- Topic models in scikit-learn