Inspecting the data
We encountered categorical variables in the previous chapter as the dichotomous variable "sex" in the athlete dataset. That dataset also contained many other categorical variables including "sport", "event", and "country".
Let's take a look at the Titanic dataset (using the clojure.java.io
library to access the file resource and the incanter.io
library to read it in):
(defn load-data [file] (-> (io/resource file) (str) (iio/read-dataset :delim \tab :header true))) (defn ex-4-1 [] (i/view (load-data :titanic)))
The preceding code generates the following table:
The Titanic dataset includes categorical variables too. For example—:sex, :pclass (the passenger class), and :embarked (a letter signifying the port of boarding). These are all string values, taking categories such as female, first, and C, but classes don't always have to be string values. Columns such as :ticket, :boat, and :body can be thought...