Source code
The Scala programming language is used to implement and evaluate the machine learning techniques presented in this book. Only a subset of the source code used to implement the techniques are presented in the book. The formal implementation of these algorithms is available on the website of Packt Publishing (http://www.packtpub.com).
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Context versus view bounds
Most Scala classes discussed in the book are parameterized with the type associated to the discrete/categorical value (Int
) or continuous value (Double
). Context bounds would require that any type used by the client code has Int
or Double
as upper bounds:
class MyClassInt[T <: Int] class MyClassFloat[T <: Double]
Such a design introduces constraints on the client to inherit from simple types and to deal with covariance and contravariance for container types [1:9].
For this book, view bounds are used instead of context bounds only where they require an implicit conversion to the parameterized type to be defined:
Class MyClassFloat[T <% Double] implicit def T2Double(t : T): Double
Presentation
For the sake of readability of the implementation of algorithms, all nonessential code such as error checking, comments, exceptions, or imports are omitted. The following code elements are discarded in the code snippet presented in the book:
- Code comments
- Validation of class parameters and method arguments:
class BaumWelchEM(val lambda: HMMLambda ...) { require( lambda != null, "Lambda model is undefined")
- Exceptions and an exception handler:
try { .. } catch { case e: ArrayIndexOutOfBoundsException =>println(e.toString) }
- Nonessential annotation:
@inline def mean = ..
- Logging and debugging code:
m_logger.debug( …)
- Private and nonessential methods
Primitives and implicits
The algorithms presented in this book share the same primitive types, generic operators, and implicit conversions.
Primitive types
For the sake of readability of the code, the following primitive types will be used:
type XY = (Double, Double) type XYTSeries = Array[(Double, Double)] type DMatrix[T] = Array[Array[T]] type DVector[T] = Array[T] type DblMatrix = DMatrix[Double] type DblVector = Array[Double]
The types have the behavior (methods) of their primitive counterpart (array). However, adding a new functionality to vectors, matrices, and time series requires classes of their own right. These classes will be introduced in the next chapter.
Type conversions
Implicit conversion is an important feature of the Scala programming language because it allows developers to specify a type conversion for an entire library in a single place. Here are a few of the implicit type conversions used throughout the book:
implicit def int2Double(n: Int): Double = n.toDouble implicit def vectorT2DblVector[T <% Double](vt: DVector[T]): DblVector = vt.map( t => t.toDouble) implicit def double2DblVector(x: Double): DblVector = Array[Double](x) implicit def dblPair2DbLVector(x: (Double, Double)): DblVector = Array[Double](x._1,x._2) implicit def dblPairs2DblRows(x: (Double, Double)): DblMatrix = Array[Array[Double]](Array[Double](x._1, x._2)) ...
Note
Library-specific conversion
The conversion between the primitive type listed here and types introduced in a particular library (such as Apache Commons Math) is declared in future chapters the first time those libraries are used.
Operators
Lastly, some operations are applied by multiple machine learning or preprocessing algorithms. They need to be defined implicitly. The operation on a pair of a vector of arbitrary type and vector of Double
is defined as follows:
def Op[T <% Double](v: DVector[T], w: DblVector, op: (T, Double) => Double): DblVector = v.zipWithIndex.map(x => op(x._1, w(x._2)))
It is also convenient to define the following operators that are included in the Scala standard library:
implicit def /(v: DblVector, n: Int):DblVector = v.map( x => x/n) implicit def /(m: DblMatrix, col: Int, z: Double): DblMatrix = { (0 until m(n).size).foreach(i => m(n)(i) /= z) }
We won't have to redefine the types, conversions, and operators from now on.
Immutability
It is usually a good idea to reduce the number of states of an object. Method invocation transitions an object from one state to another. The larger the number of methods or states, the more cumbersome the testing process becomes.
There is no point in creating a model that is not defined (trained). Therefore, making the training of a model as part of the constructor of the class it implements makes a lot of sense. Therefore, the only public methods of a machine learning algorithm are:
- Classification or prediction
- Validation
- Retrieval of model parameters (weights, latent variables, hidden states, and so on), if needed
Performance of Scala iterators
The evaluation of the performance of Scala high-order iterative methods is beyond the scope of this book. However, it is important to be aware of the trade-off of each method.
The for
loop construct is to be avoided as a counting iterator except if it is used in conjunction with yield
. It is designed to implement the for-comprehension monad (map-flatMap). The source code presented in this book uses the while
and foreach
constructs.
Scala reducer methods reduce
and fold
are also frequently used for their efficiency.