Vectors are one of the core components in Breeze. They are containers of homogenous data. In this recipe, we'll first see how to create vectors and then move on to various data manipulation functions to modify those vectors.
In this recipe, we will look at various operations on vectors. This recipe has been organized in the form of the following sub-recipes:
Let's look at each of the above sub-recipes in detail. For easier reference, the output of the respective command is shown as well. All the classes that are being used in this recipe are from the breeze.linalg
package. So, an "import breeze.linalg._"
statement at the top of your file would be perfect.
Let's look at the various ways we could construct vectors. Most of these construction mechanisms are through the apply
method of the vector. There are two different flavors of vector—breeze.linalg.DenseVector
and breeze.linalg.SparseVector
—the choice of the vector depends on the use case. The general rule of thumb is that if you have data that is at least 20 percent zeroes, you are better off choosing SparseVector
but then the 20 percent is a variant too.
Constructing a vector from values
- Creating a dense vector from values: Creating a
DenseVector
from values is just a matter of passing the values to the apply
method: - Creating a sparse vector from values: Creating a
SparseVector
from values is also through passing the values to the apply
method:
Notice how the SparseVector
stores values against the index.
Obviously, there are simpler ways to create a vector instead of just throwing all the data into its apply
method.
Calling the vector's zeros
function would create a zero vector. While the numeric types would return a 0
, the object types would return null
and the Boolean types would return false
:
Not surprisingly, the SparseVector
does not allocate any memory for the contents of the vector. However, the creation of the SparseVector
object itself is accounted for in the memory.
Creating a vector out of a function
The tabulate
function in vector is an interesting and useful function. It accepts a size argument just like the zeros
function but it also accepts a function that we could use to populate the values for the vector. The function could be anything ranging from a random number generator to a naïve index based generator, which we have implemented here. Notice how the return value of the function (Int
) could be converted into a vector of Double
by using the type
parameter:
Creating a vector of linearly spaced values
The linspace
function in breeze.linalg
creates a new Vector[Double]
of linearly spaced values between two arbitrary numbers. Not surprisingly, it accepts three arguments—the start, end, and the total number of values that we would like to generate. Please note that the start and the end values are inclusive while being generated:
Creating a vector with values in a specific range
The range
function in a vector has two variants. The plain vanilla function accepts a start and end value (start inclusive):
The other variant is an overloaded function that accepts a "step" value:
Just like the range
function, which has all the arguments as integers, there is also a rangeD
function that takes the start, stop, and the step parameters as Double
:
Creating an entire vector with a single value
Filling an entire vector with the same value is child's play. We just say HOW BIG is this vector going to be and then WHAT value. That's it.
Slicing a sub-vector from a bigger vector
Choosing a part of the vector from a previous vector is just a matter of calling the slice method on the bigger vector. The parameters to be passed are the start index, end index, and an optional "step" parameter. The step parameter adds the step value for every iteration until it reaches the end index. Note that the end index is excluded in the sub-vector:
Creating a Breeze Vector from a Scala Vector
A Breeze vector object's apply
method could even accept a Scala Vector as a parameter and construct a vector out of it:
Now let's look at the basic arithmetic that we could do on vectors with scalars and vectors.
Operations with scalars work just as we would expect, propagating the value to each element in the vector.
Adding a scalar to each element of the vector is done using the +
function (surprise!):
Similarly the other basic arithmetic operations—subtraction, multiplication, and division involves calling the respective functions named after the universally accepted symbols (-
, *
, and /
):
Calculating the dot product of two vectors
Each vector object has a function called dot
, which accepts another vector of the same length as a parameter.
Let's fill in just 2s
to a new vector of length 5
:
We'll create another vector from 0
to 5
with a step value of 1
(a fancy way of saying 0
through 4
):
Here's the dot
function:
It is to be expected of the function to complain if we pass in a vector of a different length as a parameter to the dot product - Breeze throws an IllegalArgumentException
if we do that. The full exception message is:
Creating a new vector by adding two vectors together
The +
function is overloaded to accept a vector other than the scalar we saw previously. The operation does a corresponding element-by-element addition and creates a new vector:
There's an interesting behavior encapsulated in the addition though. Assuming you try to add two vectors of different lengths, if the first vector is smaller and the second vector larger, the resulting vector would be the size of the first vector and the rest of the elements in the second vector would be ignored!
On the other hand, if the first vector is larger and the second vector smaller, it would result in an ArrayIndexOutOfBoundsException
:
Appending vectors and converting a vector of one type to another
Let's briefly see how to append two vectors and convert vectors of one numeric type to another.
Concatenating two vectors
There are two variants of concatenation. There is a vertcat
function that just vertically concatenates an arbitrary number of vectors—the size of the vector just increases to the sum of the sizes of all the vectors combined:
No surprise here. There is also the horzcat
method that places the second vector horizontally next to the first vector, thus forming a matrix.
Note
While dealing with vectors of different length, the vertcat
function happily arranges the second vector at the bottom of the first vector. Not surprisingly, the horzcat
function throws an exception:
java.lang.IllegalArgumentException
, meaning all vectors must be of the same size!
Converting a vector of Int to a vector of Double
The conversion of one type of vector into another is not automatic in Breeze. However, there is a simple way to achieve this:
Computing basic statistics
Other than the creation and the arithmetic operations that we saw previously, there are some interesting summary statistics operations that are available in the library. Let's look at them now:
Note
Needs import of breeze.linalg._
and breeze.numerics._
. The operations in the Other operations section aim to simulate the NumPy's UFunc
or universal functions.
Now, let's briefly look at how to calculate some basic summary statistics for a vector.
Calculating the mean and variance of a vector could be achieved by calling the meanAndVariance
universal function in the breeze.stats
package. Note that this needs a vector of Double
:
Note
As you may have guessed, converting an Int
vector to a Double
vector and calculating the mean and variance for that vector could be merged into a one-liner:
Calling the
stddev
on a Double
vector could give the standard deviation:
Find the largest value in a vector
The max
universal function inside the breeze.linalg
package would help us find the maximum value in a vector:
Finding the sum, square root and log of all the values in the vector
The same as with
max
, the sum
universal function inside the breeze.linalg
package calculates the sum
of the vector:
The functions sqrt
, log
, and various other universal functions in the breeze.numerics
package calculate the square root and log values of all the individual elements inside the vector: