Smoothing variables to decrease variation
We saw in the last chapter how to use Incanter Zoo to work with time series data and how to smooth values using a running mean. However, sometimes we'll want to smooth data that doesn't have a time component. For instance, we may want to track the usage of a word throughout a larger document or set of documents.
Getting ready
For this, we'll need usual dependencies:
(defproject statim "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
We'll also require those in our script or REPL:
(require '[incanter.core :as i] '[incanter.stats :as s] '[incanter.charts :as c] '[clojure.string :as str])
For this recipe, we'll look at Sir Arthur Conan Doyle's Sherlock Holmes stories. You can download this from Project Gutenberg at http://www.gutenberg.org/cache/epub/1661/pg1661.txt or http://www.ericrochester.com/clj-data-analysis/data/pg1661.txt.
How to do it…
We'll look at the distribution of baker over...