Finding data errors with Benford's law
Benford's law is a curious observation about the distribution of the first digits of numbers in many naturally occurring datasets. In sequences that conform to Benford's law, the first digit will be 1 about a third of the time, and higher digits will occur progressively less often. However, manually constructed data rarely looks like this. Because of that, lack of a Benford's Law distribution is evidence that a dataset is not manually constructed.
For example, this has been shown to hold true in financial data, and investigators leverage this for fraud detection. The US Internal Revenue Service reportedly uses it for identifying potential tax fraud, and financial auditors also use it.
Getting ready
We'll need these dependencies:
(defproject statim "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
We'll also use these requirements:
(require '[incanter.core :as i] 'incanter.io '[incanter.stats :as s])
For data...