Tools and frameworks
Before getting your hands dirty, you need to download and deploy a minimum set of tools and libraries so as not to reinvent the wheel. A few key components have to be installed in order to compile and run the source code described throughout the book. We focus on open source and commonly available libraries, although you are invited to experiment with equivalent tools of your choice. The learning curve for the frameworks described here is minimal.
Java
The code described in the book has been tested with JDK 1.7.0_45 and JDK 1.8.0_25 on Windows x64 and MacOS X x64 . You need to install the Java Development Kit if you have not already done so. Finally, the environment variables JAVA_HOME
, PATH
, and CLASSPATH
have to be updated accordingly.
Scala
The code has been tested with Scala 2.10.4. We recommend using Scala version 2.10.3 or higher and SBT 0.13 or higher. Let's assume that Scala runtime (REPL) and libraries have been properly installed and environment variables SCALA_HOME
and PATH
have been updated. The description and installation instructions of the Scala plugin for Eclipse are available at http://scala-ide.org/docs/user/gettingstarted.html.
You can also download the Scala plugin for Intellij IDEA from the JetBrains website at http://confluence.jetbrains.com/display/SCA/.
The ubiquitous simple build tool (sbt) will be our primary building engine. The syntax of the build file sbt/build.sbt
conforms to version 0.13, and is used to compile and assemble the source code presented throughout this book.
Apache Commons Math
Apache Commons Math is a Java library for numerical processing, algebra, statistics, and optimization [1:6].
Description
This is a lightweight library that provides developers with a foundation of small, ready-to-use Java classes that can be easily weaved into a machine learning problem. The examples used throughout the book require version 3.3 or higher.
The main components of Apache Commons Math are:
- Functions, differentiation, and integral and ordinary differential equations
- Statistics distribution
- Linear and nonlinear optimization
- Dense and Sparse vectors and matrices
- Curve fitting, correlation, and regression
For more information, visit http://commons.apache.org/proper/commons-math.
Licensing
We need Apache Public License 2.0; the terms are available at http://www.apache.org/licenses/LICENSE-2.0.
Installation
The installation and deployment of the Commons Math library are quite simple:
- Go to the download page, http://commons.apache.org/proper/commons-math/download_math.cgi.
- Download the latest
.jar
files in the Binaries section,commons-math3-3.3-bin.zip
(for version 3.3, for instance). - Unzip and install the
.jar
files. - Add
commons-math3-3.3.jar
toclasspath
as follows:- For Mac OS X, use the command
export CLASSPATH=$CLASSPATH:/Commons_Math_path/commons-math3-3.3.jar
- For Windows, navigate to System property | Advanced system settings | Advanced | Environment variables…, then edit the entry of the
CLASSPATH
variable
- For Mac OS X, use the command
- Add the
commons-math3-3.3.jar
file to your IDE environment if needed (that is, for Eclipse, navigate to Project | Properties | Java Build Path | Libraries | Add External JARs).
You can also download commons-math3-3.3-src.zip
from the Source section.
JFreeChart
JFreeChart is an open source chart and plotting Java library, widely used in the Java programmer community. It was originally created by David Gilbert [1:7].
Description
The library supports a variety of configurable plots and charts (scatter, dial, pie, area, bar, box and whisker, stacked, and 3D). We use JFreeChart to display the output of data processing and algorithms throughout the book, but you are encouraged to explore this great library on your own, as time permits.
Licensing
It is distributed under the terms of the GNU Lesser General Public License (LGPL), which permits its use in proprietary applications.
Installation
To install and deploy JFreeChart, perform the following steps:
- Visit http://www.jfree.org/jfreechart.
- Download the latest version from Source Forge at http://sourceforge.net/projects/jfreechart/files.
- Unzip and install the
.jar
file. - Add
jfreechart-1.0.17.jar
(for version 1.0.17) toclasspath
as follows:- For Mac OS, update the
classpath
by usingexport CLASSPATH=$CLASSPATH:/JFreeChart_path/ jfreechart-1.0.17.jar
- For Windows, go to System property | Advanced system settings | Advanced | Environment variables… and then edit the entry of the
CLASSPATH
variable
- For Mac OS, update the
- Add the
jfreechart-1.0.17.jar
file to your IDE environment, if needed.
Other libraries and frameworks
Libraries and tools that are specific to a single chapter are introduced along with the topic. Scalable frameworks are presented in the last chapter along with the instructions to download them. Libraries related to the conditional random fields and support vector machines are described in the respective chapters.
Note
Why not use Scala algebra and numerical libraries
Libraries such as Breeze, ScalaNLP, and Algebird are great Scala frameworks for linear algebra, numerical analysis, and machine learning. They provide even the most seasoned Scala programmer with a high-quality layer of abstraction. However, this book is designed as a tutorial that allows developers to write algorithms from the ground up using simple common Java libraries [1:8].