The main advantage of NumPy is the speed of the usual array operations compared to standard Python operations. For instance, a traditional summation of 10000000 elements:
The array object is the main feature provided by the NumPy library. Arrays are the equivalent of Python lists, but each element of an array has the same numerical type (typically float or int). It is possible to define an array casting from a list using the function array by using the following code. Two arguments are passed to it: the list to be converted and the type of the new generated array:
And vice versa, an array can be transformed into a list by the following code:
Note
Assigning an array to a new one will not create a new copy in memory, it will just link the new name to the same original object.
To create a new object from an existing one, the copy
function needs to be used:
Alternatively an array can be filled with a single value in the following way:
Arrays can also be created randomly using the random
submodule. For example, giving the length of an array as an input of the function, permutation
will find a random sequence of integers:
Another method, normal
, will draw a sequence of numbers from a normal distribution:
0
is the mean of the distribution while 1
is the standard deviation and 5
is the number of array's elements to draw. To use a uniform distribution, the random function will return numbers between 0
and 1
(not included):
NumPy also provides a number of functions for creating two-dimensional arrays (matrices). For instance, to create an identity matrix of a given dimension, the following code can be used:
The eye
function returns matrices with ones along the kth diagonal:
The most commonly used functions to create new arrays (1 or 2 dimensional) are zeros
and ones
which create new arrays of specified dimensions filled with these values. These are:
The zeros_like
and ones_like
functions instead create a new array with the same type as an existing one, with the same dimensions:
Another way to create two-dimensional arrays is to merge one-dimensional arrays using vstack
(vertical merge):
The creation using distributions are also possible for two-dimensional arrays, using the random
submodule. For example, a random matrix 2x3 from a uniform distribution between 0
and 1
is created by the following command:
Another often used distribution is the multivariate normal distribution:
The list [10,0]
is the mean vector, [[3, 1], [1, 4]]
is the covariance matrix and 5
is the number of samples to draw.
All the usual operations to access, slice, and manipulate a Python list can be applied in the same way, or in a similar way to an array:
The unique value can be also selected using unique
:
The values of the array can also be sorted using sort
and its indices with argsort
:
It is also possible to randomly rearrange the order of the array's elements using the shuffle
function:
NumPy also has a built-in function to compare arrays array_equal
:
Multi-dimensional arrays, however, differ from the list. In fact, a list of dimensions is specified using the comma (instead of a bracket for list). For example, the elements of a two-dimensional array (that is a matrix) are accessed in the following way:
Slicing is applied on each dimension using the colon :
symbol between the initial value and the end value of the slice:
While a single :
means all the elements along that axis are considered:
One-dimensional arrays can be obtained from multi-dimensional arrays using the flatten
function:
It is also possible to inspect an array object to obtain information about its content. The size of an array is found using the attribute shape:
In this case, arr
is a matrix of two rows and three columns. The dtype
property returns the type of values are stored within the array:
float64
is a numeric type to store double-precision (8-byte) real numbers (similar to float
type in regular Python). There are also other data types such as int64
, int32
, string,
and an array can be converted from one type to another. For example:
The len
function returns the length of the first dimension when used on an array:
Like in Python for loop, the in
word can be used to check if a value is contained in an array:
An array can be manipulated in such a way that its elements are rearranged in different dimensions using the function reshape
. For example, a matrix with eight rows and one column can be reshaped to a matrix with four rows and two columns:
In addition, transposed matrices can be created; that is to say, a new array with the final two dimensions switched can be obtained using the transpose function:
Arrays can also be transposed using the T
attribute:
Another way to reshuffle the elements of an array is to use the newaxis
function to increase the dimensionality:
In this example, in each case the new array has two dimensions, the one generated by newaxis
has a length of one.
Joining arrays is an operation performed by the concatenate
function in NumPy, and the syntax depends on the dimensionality of the array. Multiple one-dimensional arrays can be chained, specifying the arrays to be joined as a tuple:
Using a multi-dimensional array, the axis along which multiple arrays are concatenated needs to be specified. Otherwise, NumPy concatenates along the first dimension by default:
It is common to save a large amount of data as a binary file instead of using the direct format. NumPy provides a function, tostring
, to convert an array to a binary string. Of course there's also the inverse operation, where a conversion of a binary string to an array is supported using the fromstring
routine. For example:
Common mathematical operations are obviously supported in NumPy. For example:
Since any operation is applied element wise, the arrays are required to have the same size. If this condition is not satisfied, an error is returned:
The error states that the objects cannot be broadcasted
because the only way to perform an operation with arrays of different size is called broadcasting. This means the arrays have a different number of dimensions, and the array with less dimensions will be repeated until it matches the dimensions of the other array. Consider the following:
The array arr2
was broadcasted to a two-dimensional array that matched the size of arr1
. Therefore, arr2
was repeated for each dimension of arr1
, equivalent to the array:
If we want to make the way an array is broadcasted explicit, the newaxis
constant allows us to specify how we want to broadcast:
Unlike Python lists, arrays can be queried using conditions. A typical example is to use Boolean arrays to filter the elements:
Multiple Boolean expressions can be used to subset the array:
Arrays of integers can be used to specify the indices to select the elements of another array. For example:
The arr2
represents the ordered indices to select elements from array arr1
: the zeroth, first, first, third, first, first, and first elements of arr1
, in that order have been selected. Also lists can be used for the same purpose:
In order to replicate the same operation with multi-dimensional arrays, multiple one-dimensional integer arrays have to be put into the selection bracket, one for each dimension.
The first selection array represents the values of the first index in the matrix entries, while the values on the second selection array represent the column index of the matrix entries. The following example illustrates the idea:
The values on arr2
are the first index (row) on arr1
entries while arr3
are the second index (column) values, so the first chosen entry on arr1
corresponds to row 1 column 1 which is 13
.
The function take
can be used to apply your selection with integer arrays, and it works in the same way as bracket selection:
Subsets of a multi-dimensional array can be selected along a given dimension specifying the axis argument on the take
function:
The put
function is the opposite of the take
function, and it takes values from an array and puts them at specified indices in the array that calls the put
method:
We finish this section with the note that multiplication also remains element-wise for two-dimensional arrays (and does not correspond to matrix multiplication):
Linear algebra operations
The most common operations between matrices is the inner product of a matrix with its transpose, XT X, using np.dot
:
There are functions to directly calculate the different types of product (inner
, outer
, and cross
) on arrays (that is matrices or vectors).
For one-dimensional arrays (vectors) the inner product corresponds to the dot product:
NumPy also contains a sub-module, linalg
that has a series of functions to perform linear algebra calculations over matrices. The determinant of a matrix can be computed as:
Also the inverse of a matrix can be generated using the function inv
:
It is straightforward to calculate the eigenvalues and eigenvectors of a matrix: