Cleaning data with regular expressions
Often, cleaning data involves text transformations. Some, such as adding or removing a set and static strings, are pretty simple. Others, such as parsing a complex data format such as JSON or XML, requires a complete parser. However, many fall within a middle range of complexity. These need more processing power than simple string manipulation, but full-fledged parsing is too much. For these tasks, regular expressions are often useful.
Probably, the most basic and pervasive tool to clean data of any kind is a regular expression. Although they're overused sometimes, regular expressions truly are the best tool for the job many times. Moreover, Clojure has a built-in syntax for compiled regular expressions, so they are convenient too.
In this example, we'll write a function that normalizes U.S. phone numbers.
Getting ready
For this recipe, we will only require a very basic project.clj
file. It should have these lines:
(defproject cleaning-data "...