Working with vectors
There are subtle yet powerful differences between Breeze vectors and Scala's own scala.collection.Vector
. As we'll see in this recipe, Breeze vectors have a lot of functions that are linear algebra specific, and the more important thing to note here is that Breeze's vector is a Scala wrapper over netlib-java
and most calls to the vector's API delegates the call to it.
Vectors are one of the core components in Breeze. They are containers of homogenous data. In this recipe, we'll first see how to create vectors and then move on to various data manipulation functions to modify those vectors.
In this recipe, we will look at various operations on vectors. This recipe has been organized in the form of the following sub-recipes:
- Creating vectors:
- Creating a vector from values
- Creating a zero vector
- Creating a vector out of a function
- Creating a vector of linearly spaced values
- Creating a vector with values in a specific range
- Creating an entire vector with a single value
- Slicing a sub-vector from a bigger vector
- Creating a Breeze vector from a Scala vector
- Vector arithmetic:
- Scalar operations
- Calculating the dot product of a vector
- Creating a new vector by adding two vectors together
- Appending vectors and converting a vector of one type to another:
- Concatenating two vectors
- Converting a vector of int to a vector of double
- Computing basic statistics:
- Mean and variance
- Standard deviation
- Find the largest value
- Finding the sum, square root and log of all the values in the vector
Getting ready
In order to run the code, you could either use the Scala or use the Worksheet feature available in the Eclipse Scala plugin (or Scala IDE) or in IntelliJ IDEA. The reason these options are suggested is due to their quick turnaround time.
How to do it...
Let's look at each of the above sub-recipes in detail. For easier reference, the output of the respective command is shown as well. All the classes that are being used in this recipe are from the breeze.linalg
package. So, an "import breeze.linalg._"
statement at the top of your file would be perfect.
Creating vectors
Let's look at the various ways we could construct vectors. Most of these construction mechanisms are through the apply
method of the vector. There are two different flavors of vector—breeze.linalg.DenseVector
and breeze.linalg.SparseVector
—the choice of the vector depends on the use case. The general rule of thumb is that if you have data that is at least 20 percent zeroes, you are better off choosing SparseVector
but then the 20 percent is a variant too.
Constructing a vector from values
- Creating a dense vector from values: Creating a
DenseVector
from values is just a matter of passing the values to theapply
method:val dense=DenseVector(1,2,3,4,5) println (dense) //DenseVector(1, 2, 3, 4, 5)
- Creating a sparse vector from values: Creating a
SparseVector
from values is also through passing the values to theapply
method:val sparse=SparseVector(0.0, 1.0, 0.0, 2.0, 0.0) println (sparse) //SparseVector((0,0.0), (1,1.0), (2,0.0), (3,2.0), (4,0.0))
Notice how the SparseVector
stores values against the index.
Obviously, there are simpler ways to create a vector instead of just throwing all the data into its apply
method.
Creating a zero vector
Calling the vector's zeros
function would create a zero vector. While the numeric types would return a 0
, the object types would return null
and the Boolean types would return false
:
val denseZeros=DenseVector.zeros[Double](5) //DenseVector(0.0, 0.0, 0.0, 0.0, 0.0) val sparseZeros=SparseVector.zeros[Double](5) //SparseVector()
Not surprisingly, the SparseVector
does not allocate any memory for the contents of the vector. However, the creation of the SparseVector
object itself is accounted for in the memory.
Creating a vector out of a function
The tabulate
function in vector is an interesting and useful function. It accepts a size argument just like the zeros
function but it also accepts a function that we could use to populate the values for the vector. The function could be anything ranging from a random number generator to a naïve index based generator, which we have implemented here. Notice how the return value of the function (Int
) could be converted into a vector of Double
by using the type
parameter:
val denseTabulate=DenseVector.tabulate[Double](5)(index=>index*index) //DenseVector(0.0, 1.0, 4.0, 9.0, 16.0)
Creating a vector of linearly spaced values
The linspace
function in breeze.linalg
creates a new Vector[Double]
of linearly spaced values between two arbitrary numbers. Not surprisingly, it accepts three arguments—the start, end, and the total number of values that we would like to generate. Please note that the start and the end values are inclusive while being generated:
val spaceVector=breeze.linalg.linspace(2, 10, 5) //DenseVector(2.0, 4.0, 6.0, 8.0, 10.0)
Creating a vector with values in a specific range
The range
function in a vector has two variants. The plain vanilla function accepts a start and end value (start inclusive):
val allNosTill10=DenseVector.range(0, 10) //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
The other variant is an overloaded function that accepts a "step" value:
val evenNosTill20=DenseVector.range(0, 20, 2) // DenseVector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18)
Just like the range
function, which has all the arguments as integers, there is also a rangeD
function that takes the start, stop, and the step parameters as Double
:
val rangeD=DenseVector.rangeD(0.5, 20, 2.5) // DenseVector(0.5, 3.0, 5.5, 8.0, 10.5, 13.0, 15.5)
Creating an entire vector with a single value
Filling an entire vector with the same value is child's play. We just say HOW BIG is this vector going to be and then WHAT value. That's it.
val denseJust2s=DenseVector.fill(10, 2) // DenseVector(2, 2, 2, 2, 2, 2 , 2, 2, 2, 2)
Slicing a sub-vector from a bigger vector
Choosing a part of the vector from a previous vector is just a matter of calling the slice method on the bigger vector. The parameters to be passed are the start index, end index, and an optional "step" parameter. The step parameter adds the step value for every iteration until it reaches the end index. Note that the end index is excluded in the sub-vector:
val allNosTill10=DenseVector.range(0, 10) //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) val fourThroughSevenIndexVector= allNosTill10.slice(4, 7) //DenseVector(4, 5, 6) val twoThroughNineSkip2IndexVector= allNosTill10.slice(2, 9, 2) //DenseVector(2, 4, 6)
Creating a Breeze Vector from a Scala Vector
A Breeze vector object's apply
method could even accept a Scala Vector as a parameter and construct a vector out of it:
val vectFromArray=DenseVector(collection.immutable.Vector(1,2,3,4)) // DenseVector(Vector(1, 2, 3, 4))
Vector arithmetic
Now let's look at the basic arithmetic that we could do on vectors with scalars and vectors.
Scalar operations
Operations with scalars work just as we would expect, propagating the value to each element in the vector.
Adding a scalar to each element of the vector is done using the +
function (surprise!):
val inPlaceValueAddition=evenNosTill20 +2 //DenseVector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
Similarly the other basic arithmetic operations—subtraction, multiplication, and division involves calling the respective functions named after the universally accepted symbols (-
, *
, and /
):
//Scalar subtraction val inPlaceValueSubtraction=evenNosTill20 -2 //DenseVector(-2, 0, 2, 4, 6, 8, 10, 12, 14, 16) //Scalar multiplication val inPlaceValueMultiplication=evenNosTill20 *2 //DenseVector(0, 4, 8, 12, 16, 20, 24, 28, 32, 36) //Scalar division val inPlaceValueDivision=evenNosTill20 /2 //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Calculating the dot product of two vectors
Each vector object has a function called dot
, which accepts another vector of the same length as a parameter.
Let's fill in just 2s
to a new vector of length 5
:
val justFive2s=DenseVector.fill(5, 2) //DenseVector(2, 2, 2, 2, 2)
We'll create another vector from 0
to 5
with a step value of 1
(a fancy way of saying 0
through 4
):
val zeroThrough4=DenseVector.range(0, 5, 1) //DenseVector(0, 1, 2, 3, 4)
Here's the dot
function:
val dotVector=zeroThrough4.dot(justFive2s) //Int = 20
It is to be expected of the function to complain if we pass in a vector of a different length as a parameter to the dot product - Breeze throws an IllegalArgumentException
if we do that. The full exception message is:
Java.lang.IllegalArgumentException: Vectors must be the same length!
Creating a new vector by adding two vectors together
The +
function is overloaded to accept a vector other than the scalar we saw previously. The operation does a corresponding element-by-element addition and creates a new vector:
val evenNosTill20=DenseVector.range(0, 20, 2) //DenseVector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18) val denseJust2s=DenseVector.fill(10, 2) //DenseVector(2, 2, 2, 2, 2, 2, 2, 2, 2, 2) val additionVector=evenNosTill20 + denseJust2s // DenseVector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
There's an interesting behavior encapsulated in the addition though. Assuming you try to add two vectors of different lengths, if the first vector is smaller and the second vector larger, the resulting vector would be the size of the first vector and the rest of the elements in the second vector would be ignored!
val fiveLength=DenseVector(1,2,3,4,5) //DenseVector(1, 2, 3, 4, 5) val tenLength=DenseVector.fill(10, 20) //DenseVector(20, 20, 20, 20, 20, 20, 20, 20, 20, 20) fiveLength+tenLength //DenseVector(21, 22, 23, 24, 25)
On the other hand, if the first vector is larger and the second vector smaller, it would result in an ArrayIndexOutOfBoundsException
:
tenLength+fiveLength // java.lang.ArrayIndexOutOfBoundsException: 5
Appending vectors and converting a vector of one type to another
Let's briefly see how to append two vectors and convert vectors of one numeric type to another.
Concatenating two vectors
There are two variants of concatenation. There is a vertcat
function that just vertically concatenates an arbitrary number of vectors—the size of the vector just increases to the sum of the sizes of all the vectors combined:
val justFive2s=DenseVector.fill(5, 2) //DenseVector(2, 2, 2, 2, 2) val zeroThrough4=DenseVector.range(0, 5, 1) //DenseVector(0, 1, 2, 3, 4) val concatVector=DenseVector.vertcat(zeroThrough4, justFive2s) //DenseVector(0, 1, 2, 3, 4, 2, 2, 2, 2, 2)
No surprise here. There is also the horzcat
method that places the second vector horizontally next to the first vector, thus forming a matrix.
val concatVector1=DenseVector.horzcat(zeroThrough4, justFive2s) //breeze.linalg.DenseMatrix[Int] 0 2 1 2 2 2 3 2 4 2
Note
While dealing with vectors of different length, the vertcat
function happily arranges the second vector at the bottom of the first vector. Not surprisingly, the horzcat
function throws an exception:
java.lang.IllegalArgumentException
, meaning all vectors must be of the same size!
Converting a vector of Int to a vector of Double
The conversion of one type of vector into another is not automatic in Breeze. However, there is a simple way to achieve this:
val evenNosTill20Double=breeze.linalg.convert(evenNosTill20, Double)
Computing basic statistics
Other than the creation and the arithmetic operations that we saw previously, there are some interesting summary statistics operations that are available in the library. Let's look at them now:
Note
Needs import of breeze.linalg._
and breeze.numerics._
. The operations in the Other operations section aim to simulate the NumPy's UFunc
or universal functions.
Now, let's briefly look at how to calculate some basic summary statistics for a vector.
Mean and variance
Calculating the mean and variance of a vector could be achieved by calling the meanAndVariance
universal function in the breeze.stats
package. Note that this needs a vector of Double
:
meanAndVariance(evenNosTill20Double) //MeanAndVariance(9.0,36.666666666666664,10)
Note
As you may have guessed, converting an Int
vector to a Double
vector and calculating the mean and variance for that vector could be merged into a one-liner:
meanAndVariance(convert(evenNosTill20, Double))
Standard deviation
Calling the
stddev
on a Double
vector could give the standard deviation:
stddev(evenNosTill20Double) //Double = 6.0553007081949835
Find the largest value in a vector
The max
universal function inside the breeze.linalg
package would help us find the maximum value in a vector:
val intMaxOfVectorVals=max (evenNosTill20) //18
Finding the sum, square root and log of all the values in the vector
The same as with
max
, the sum
universal function inside the breeze.linalg
package calculates the sum
of the vector:
val intSumOfVectorVals=sum (evenNosTill20) //90
The functions sqrt
, log
, and various other universal functions in the breeze.numerics
package calculate the square root and log values of all the individual elements inside the vector:
The Sqrt function
val sqrtOfVectorVals= sqrt (evenNosTill20) // DenseVector(0.0, 1. 4142135623730951, 2.0, 2.449489742783178, 2.8284271247461903, 3.16227766016 83795, 3.4641016151377544, 3.7416573867739413, 4.0, 4.242640687119285)
The Log function
val log2VectorVals=log(evenNosTill20) // DenseVector(-Infinity , 0.6931471805599453, 1.3862943611198906, 1.791759469228055, 2.079441541679 8357, 2.302585092994046, 2.4849066497880004, 2.6390573296152584, 2.77258872 2239781, 2.8903717578961645)