Introduction
In the previous chapter, we learned about different ways to collect data from local files and online resources. In this chapter, we will focus on topic modeling, which is a popular concept within natural language processing. Topic modeling is a simple way to capture meaning from a collection of documents. Note that, in this case, documents are any coherent collection of words, which could be as short as a tweet or as long as an article, based on the project at hand.
A topic model captures information about the concepts contained in a set of texts. Using these concepts, documents can be organized into different categories or topics.
Topic modeling is mostly done using unsupervised learning algorithms, which detect topics on their own. Topic modeling algorithms operate by doing statistical analysis of words or tokens in documents, and then they use those statistics to automatically assign documents to topics.
In this chapter, we will look at a few popular topic...