Parsing dates and times
One difficult issue when normalizing and cleaning up data is how to deal with time. People enter dates and times in a bewildering variety of formats; some of them are ambiguous, and some of them are vague. However, we have to do our best to interpret them and normalize them into a standard format.
In this recipe, we'll define a function that attempts to parse a date into a standard string format. We'll use the clj-time
Clojure library, which is a wrapper around the Joda Java library (http://joda-time.sourceforge.net/).
Getting ready
First, we need to declare our dependencies in the Leiningen project.clj
file:
(defproject cleaning-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [clj-time "0.9.0-beta1"]])
Then, we need to load these dependencies into our script or REPL. We'll exclude second
from clj-time
to keep it from clashing with clojure.core/second
:
(use '[clj-time.core :exclude (extend second)] '[clj-time.format])