Reading data from JDBC databases
Reading data from a relational database is only slightly more complicated than reading from Excel, and much of the extra complication involves connecting to the database.
Fortunately, there's a Clojure-contributed package that sits on top of JDBC (the Java database connector API, http://www.oracle.com/technetwork/java/javase/jdbc/index.html) and makes working with databases much easier. In this example, we'll load a table from an SQLite database (http://www.sqlite.org/), which stores the database in a single file.
Getting ready
First, list the dependencies in your Leiningen project.clj
file. We will also need to include the database driver library. For this example, it is org.xerial/sqlite-jdbc
:
(defproject getting-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"] [org.clojure/java.jdbc "0.3.3"] [org.xerial/sqlite-jdbc "3.7.15-M1"]])
Then, load the modules into your REPL or script file:
(require '[incanter.core :as i] '[clojure.java.jdbc :as j])
Finally, get the database connection information. I have my data in an SQLite database file named data/small-sample.sqlite
, as shown in the following screenshot. You can download this from http://www.ericrochester.com/clj-data-analysis/data/small-sample.sqlite.
How to do it…
Loading the data is not complicated, but we'll make it easier with a wrapper function:
- We'll create a function that takes a database connection map and a table name and returns a dataset created from this table:
(defn load-table-data "This loads the data from a database table." [db table-name] (i/to-dataset (j/query db (str "SELECT * FROM " table-name ";"))))
- Next, we define a database map with the connection parameters suitable for our database:
(defdb {:subprotocol "sqlite" :subname "data/small-sample.sqlite" :classname "org.sqlite.JDBC"})
- Finally, call
load-table-data
withdb
and a table name as a symbol or string:user=> (load-table-data db 'people) | :relation | :surname | :given_name | |-------------+----------+-------------| | father | Addams | Gomez | | mother | Addams | Morticia | | brother | Addams | Pugsley ||| …
How it works…
The load-table-data
function passes the database connection information directly through to clojure.java.jdbc/query.query
. It creates an SQL query that returns all of the fields in the table that is passed in. Each row of the result is a sequence of hashes mapping column names to data values. This sequence is wrapped in a dataset by incanter.core/to-dataset
.
See also
Connecting to different database systems using JDBC isn't necessarily a difficult task, but it's dependent on which database you wish to connect to. Oracle has a tutorial for how to work with JDBC at http://docs.oracle.com/javase/tutorial/jdbc/basics, and the documentation for the clojure.java.jdbc
library has some good information too (http://clojure.github.com/java.jdbc/). If you're trying to find out what the connection string looks like for a database system, there are lists available online. The list at http://www.java2s.com/Tutorial/Java/0340__Database/AListofJDBCDriversconnectionstringdrivername.htm includes the major drivers.