You're reading from Scala for Machine Learning Leverage Scala and Machine Learning to construct and study systems that can learn from data

Product type Paperback

Published in Dec 2014

Publisher

ISBN-13 9781783558742

Length 624 pages

Edition 1st Edition

Languages

Scala

Concepts

Machine Learning

Author (1):

Patrick R. Nicolas

View More author details

Table of Contents (15) Chapters

Preface

1. Getting Started FREE CHAPTER

2. Hello World!

3. Data Preprocessing

4. Unsupervised Learning

5. Naïve Bayes Classifiers

6. Regression and Regularization

7. Sequential Data Models

8. Kernel Models and Support Vector Machines

9. Artificial Neural Networks

10. Genetic Algorithms

11. Reinforcement Learning

12. Scalable Frameworks

A. Basic Concepts

Index

Source code

The Scala programming language is used to implement and evaluate the machine learning techniques presented in this book. Only a subset of the source code used to implement the techniques are presented in the book. The formal implementation of these algorithms is available on the website of Packt Publishing (http://www.packtpub.com).

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Context versus view bounds

Most Scala classes discussed in the book are parameterized with the type associated to the discrete/categorical value (Int) or continuous value (Double). Context bounds would require that any type used by the client code has Int or Double as upper bounds:

class MyClassInt[T <: Int]
class MyClassFloat[T <: Double]

Such a design introduces constraints on the client to inherit from simple types and to deal with covariance and contravariance for container types [1:9].

For this book, view bounds are used instead of context bounds only where they require an implicit conversion to the parameterized type to be defined:

Class MyClassFloat[T <% Double]
implicit def T2Double(t : T): Double

Presentation

For the sake of readability of the implementation of algorithms, all nonessential code such as error checking, comments, exceptions, or imports are omitted. The following code elements are discarded in the code snippet presented in the book:

Code comments

Validation of class parameters and method arguments:

class BaumWelchEM(val lambda: HMMLambda ...) {
   require( lambda != null, "Lambda model is undefined")

Exceptions and an exception handler:

   try { .. }
   catch {
      case e: ArrayIndexOutOfBoundsException  =>println(e.toString)
    }

Nonessential annotation:
```
   @inline def mean = ..
```
Logging and debugging code:
```
       m_logger.debug( …)
```
Private and nonessential methods

Primitives and implicits

The algorithms presented in this book share the same primitive types, generic operators, and implicit conversions.

Primitive types

For the sake of readability of the code, the following primitive types will be used:

type XY = (Double, Double)
type XYTSeries = Array[(Double, Double)]
type DMatrix[T] = Array[Array[T]]
type DVector[T] = Array[T]  
type DblMatrix = DMatrix[Double]
type DblVector = Array[Double]

The types have the behavior (methods) of their primitive counterpart (array). However, adding a new functionality to vectors, matrices, and time series requires classes of their own right. These classes will be introduced in the next chapter.

Type conversions

Implicit conversion is an important feature of the Scala programming language because it allows developers to specify a type conversion for an entire library in a single place. Here are a few of the implicit type conversions used throughout the book:

implicit def int2Double(n: Int): Double = n.toDouble
implicit def vectorT2DblVector[T <% Double](vt: DVector[T]): DblVector = vt.map( t => t.toDouble)
implicit def double2DblVector(x: Double): DblVector = Array[Double](x)
implicit def dblPair2DbLVector(x: (Double, Double)): DblVector = Array[Double](x._1,x._2)
implicit def dblPairs2DblRows(x: (Double, Double)): DblMatrix = Array[Array[Double]](Array[Double](x._1, x._2))
...

Note

Library-specific conversion

The conversion between the primitive type listed here and types introduced in a particular library (such as Apache Commons Math) is declared in future chapters the first time those libraries are used.

Operators

Lastly, some operations are applied by multiple machine learning or preprocessing algorithms. They need to be defined implicitly. The operation on a pair of a vector of arbitrary type and vector of Double is defined as follows:

def Op[T <% Double](v: DVector[T], w: DblVector, op: (T, Double) => Double): DblVector = 
   v.zipWithIndex.map(x => op(x._1, w(x._2)))

It is also convenient to define the following operators that are included in the Scala standard library:

implicit def /(v: DblVector, n: Int):DblVector = v.map( x => x/n)
implicit def /(m: DblMatrix, col: Int, z: Double): DblMatrix = { (0 until m(n).size).foreach(i => m(n)(i) /= z)  }

We won't have to redefine the types, conversions, and operators from now on.

Immutability

It is usually a good idea to reduce the number of states of an object. Method invocation transitions an object from one state to another. The larger the number of methods or states, the more cumbersome the testing process becomes.

There is no point in creating a model that is not defined (trained). Therefore, making the training of a model as part of the constructor of the class it implements makes a lot of sense. Therefore, the only public methods of a machine learning algorithm are:

Classification or prediction
Validation
Retrieval of model parameters (weights, latent variables, hidden states, and so on), if needed

Performance of Scala iterators

The evaluation of the performance of Scala high-order iterative methods is beyond the scope of this book. However, it is important to be aware of the trade-off of each method.

The for loop construct is to be avoided as a counting iterator except if it is used in conjunction with yield. It is designed to implement the for-comprehension monad (map-flatMap). The source code presented in this book uses the while and foreach constructs.

Scala reducer methods reduce and fold are also frequently used for their efficiency.