Topic modeling descriptions
Another way to gain a better understanding of the descriptions is to use topic modeling. We learned about this text mining and machine learning algorithm in Chapter 3, Topic Modeling – Changing Concerns in the State of the Union Addresses. In this case, we'll see if we can use it to create topics over these descriptions and to pull out the differences, trends, and patterns from this set of texts.
First, we'll create a new namespace to handle our topic modeling. We'll use the src/ufo_data/tm.clj
file. The following is the namespace declaration for it:
(ns ufo-data.tm (:require [clojure.java.io :as io] [clojure.string :as str] [clojure.pprint :as pp]) (:import [cc.mallet.util.*] [cc.mallet.types InstanceList] [cc.mallet.pipe Input2CharSequence TokenSequenceLowercase CharSequence2TokenSequence SerialPipes TokenSequenceRemoveStopwords TokenSequence2FeatureSequence] ...