Running the experiment
Remember, earlier we defined functions to break a sequence of tokens into features of various sorts: unigrams, bigrams, trigrams, and POS-tagged unigrams. We can take these and automatically test both the classifiers against all of these types of features. Let's see how.
First, we'll define some top-level variables that associate label keywords with the functions that we want to test at that point in the process (that is, classifiers or feature-generators):
(def classifiers {:naive-bayes a/k-fold-naive-bayes :maxent a/k-fold-logistic}) (def feature-factories {:unigram t/unigrams :bigram t/bigrams :trigram t/trigrams :pos (let [pos-model (t/read-me-tagger "data/en-pos-maxent.bin")] (fn [ts] (t/with-pos pos-model ts)))})
We can now iterate over both of these hash maps and cross-validate these classifiers on these features. We'll average the error information (the precision and recall) for all of them and return the averages. Once we've executed...