Loading CSV and ARFF files into Weka
Weka is most comfortable when using its own file format: the Attribute-Relation File Format (ARFF). This format includes the types of data in the columns and other information that allow it to be loaded incrementally, and both of these can be important features. Because of this, Weka can load data more reliably. However, Weka can still import CSV files, and when it does, it attempts to guess the type of data in the columns.
In this recipe, we'll see what's necessary to load data from a CSV file and an ARFF file.
Getting ready
First, we'll need to add Weka to the dependencies in our Leiningen project.clj
file:
(defproject d-mining "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [nz.ac.waikato.cms.weka/weka-dev "3.7.11"]])
Then we'll import the right classes into our script or REPL:
(import [weka.core.converters ArffLoader CSVLoader] [java.io File])
Finally, we'll...