Chapter 3. Topic Modeling – Changing Concerns in the State of the Union Addresses
A huge source of data right now is the volumes of unstructured, natural-language data that's everywhere on the Internet. Think of all the news articles, blog posts, Twitter posts, and YouTube comments as well as the thousands of other ways that people can create and share textual content online. What they're saying may be important to you, and being able to track what subjects they are talking about is incredibly useful to become aware of the trends and conversations.
A tool to explore the information a group of text documents discusses is called topic modeling. This is a technique to identify the "topics" discussed in a collection of documents, although as we'll see, "topics" is defined a little differently here than it is in informal conversation. The strength of these models is that they don't assume that each document talks only about one thing. Instead, they model documents as collections of topics. This...