Matrices as transformations
Matrices are typically applied to vectors and other matrices through the process of matrix multiplication. However, the dry mechanics of matrix multiplication tend to hide what a matrix really represents and what matrix multiplication does. We aim to shed light on what matrices really are in this chapter. We’ll start by covering the basics of matrix multiplication and then show how matrices represent transformations.
Matrix multiplication
If we have a matrix of size
, and a matrix
of size
, then we can multiply those two matrices together to get a new matrix
, which is of size
. The matrix element
is calculated as follows:
Eq. 10
In this example, we are multiplying a matrix by an
matrix. Schematically, we can write this as follows:
Eq. 11
From this, it is clear that the “inner” dimensions in this example match, both being . To multiply two matrices together, the inner dimensions must match, when we write the multiplication out in this schematic way. If the dimensions do not match, we cannot multiply the matrices together. For example, we cannot multiply a
matrix by a
matrix.
The inner product as matrix multiplication
In fact, if we think of a -dimensional column vector
as a
matrix, and its transpose, a
,which is a
-dimensional row vector, as a
matrix, then multiplying those two matrices together as a
a gives a
result, or in other words, a scalar. The value of that scalar is calculated using the right-hand side of Eq. 10 for matrix multiplication and gives the same calculation as Eq. 2 for the inner product. In other words, we have the following:
Eq. 12
More generally, if we have two vectors and
of the same length, then
. Let’s look at the last formula more schematically, as shown in Figure 3.2:

Figure 3.2: Matrix multiplication of a row vector and column vector is the same as the inner product
The left-hand side of the schematic equation in Figure 3.2 is a matrix multiplication, but if we calculate that matrix multiplication by hand, we get the expression on the right-hand side of the figure, which is just the inner product .
Matrix multiplication as a series of inner products
We can extend this connection between inner products and matrix multiplication by looking again at the right-hand side of Eq. 10. The matrix element looks like an inner product between a vector formed from the
row of matrix
and the
column of matrix
. That means we can represent the matrix multiplication
schematically as follows:

Figure 3.3: Matrix elements resulting from a matrix multiplication can be viewed as inner product calculations
In fact, this schematic is how I remember how to do matrix multiplication, not the dry formula given in Eq. 10.
Matrix multiplication is not commutative
The matrix multiplication is, of course, the matrix counterpart of the ordinary multiplication of real numbers that we are familiar with from school. Even the notation
gives the impression that matrix multiplication will follow the same rules and patterns as ordinary multiplication. This is not the case. There are some subtleties and nuances with matrix multiplication. One of these subtleties to be aware of is that the order of the matrices matters. In general, for two different matrices
and
, we have
. We say that matrix multiplication is not commutative.
To see this more concretely, let’s take an explicit example. Consider these two matrices:
Eq. 13
You can confirm for yourself, by doing the matrix multiplications by hand, that the following apply:
Eq. 14
Obviously, there are special cases where matrix multiplication does commute, for example, the trivial case when , in which case
. There are also cases when
and
are different and yet we have
. These special cases require extra conditions on the properties of the matrices
and
, but for now, we can say that in general,
, so be careful you don’t assume it.
The outer product as a matrix multiplication
Just as we showed that the inner product between two vectors could be written as a matrix multiplication, we can do the same for calculating the outer product between two vectors. If we have an -component real-valued column vector
, we can think of it as an
matrix. Likewise, if we have an
-component real-valued row vector b
then we can think of it as a
matrix. We can then multiply these two matrices together to get a b
. From the rules of matrix multiplication, this is an
matrix whose
matrix element is given by
, which is the same as we get when we calculate the outer product,
between the vectors
and
. Because of this, we almost always use the more succinct notation a b
to denote the outer product
when
and
are real-valued. Schematically, we have the following:
Eq. 15
You’ll recall that we could also calculate the outer product, , from the vectors
and
. You will have guessed that we can also write this outer product as the matrix multiplication
when
and
are real-valued. Again, this notation is more commonly used to represent the outer product, rather than
.
Multiplying multiple matrices together
Once we know how to multiply two matrices together, it is a simple matter to multiply many matrices together – we simply take them two at a time. For example, if we have matrices
,
, and
, then their product can be calculated via the following:
Eq. 16
We can either multiply and
together first and then multiply the result by
, or we can multiply
and
together first and then use the result to multiply
. Either way, we get the same result. This means matrix multiplication is associative.
Transforming a vector by matrix multiplication
So far, we have learned about vectors and matrices and their basic properties. We have also learned how to multiply matrices together. We have even seen how we can consider a vector as a special kind of matrix. This immediately raises the question of what happens if we multiply a vector by a matrix – what do we get and what does that multiplication represent?
Consider an matrix
and an
-component column vector
. As we can think of the vector
as an
matrix, we can clearly multiply
by
using the rules of matrix multiplication. In fact, we get the following:
Eq. 17
The result is an matrix, that is, an
-component column vector. So, multiplying a vector by a matrix gives us another vector. The components of this new vector are given by the expressions inside the brackets on the right-hand side of Eq. 17. The components of this new vector are (in general) different from those of vector
, so the effect of multiplying a vector by a matrix is to transform that vector. From this, we can conclude that matrices represent transformations.
If we look more closely at the individual expressions in the vector on the right-hand side of Eq. 17, we can see that each component in the new vector is a linear combination of the components in the old vector . So, the matrix
represents a linear transformation. The individual matrix elements
, and so on tell us the weights in those linear combinations that give us the components of the new vector. In other words, the individual matrix elements encode the details of the linear transformation.
One thing we haven’t spoken about yet is what effect the relative sizes of and
has. If
, then obviously, multiplying an
-component vector
by the
matrix
gives us another
-component vector. Although we have transformed the vector, we have, in this case, stayed within the same
-dimensional space. However, if
, then our new vector has fewer components than the starting vector
, and so we have reduced the dimensionality. Alternatively, if
, our new vector has more components than we started with, and we have increased the dimensionality.
In all the examples previously, we have been multiplying a column vector by a matrix. But we can equally multiply a matrix by a row vector. Let’s stick with our vector but we will use its row vector form
. Now we can think of
as a
matrix. So, if we have an
matrix
, then we can perform the matrix multiplication
, and we get a
matrix out of it, that is, an
-component row vector. As you might expect, this new
-component vector is just a linear transformation of our starting
-component vector
, with the details of the linear transformation encoded in the matrix elements
.
Finally, we should highlight that since matrix multiplication is a linear transformation, it means that if we apply a matrix to a combination of vectors, the result is the same as combining the results of applying
to each vector individually. In more detail, we have the following:
Eq. 18
We will make use of this fact shortly.
The identity matrix
Now that we have learned that matrix multiplication represents the linear transformation of vectors, let’s look at some particular special cases of transformations. Consider the matrix
given here:
Eq. 19
The matrix has 1 for each matrix element along its diagonal and 0 everywhere else. Now, what is the effect of multiplying by
? Let’s try it. Consider an
-component column vector
. If we multiply
by
, we get the result shown here:
Eq. 20
So, multiplying any vector by
just gives us back
itself. We haven’t done anything to the starting vector. The transformation represented by
is just the identity transformation, which leaves vectors untouched. Hence,
is called the identity matrix. Or, more specifically, it is the
-component identity matrix because it operates on
-component vectors.
It is a simple matter to confirm, via a similar calculation to the previous one, that if we reverse the order of the calculation, so that we multiply by a row vector
, we leave the row vector unchanged. In terms of math notation, we have the following:
Eq. 21
Now remember that when we explained matrix multiplication as a series of inner products, we learned that we could think of a matrix as a set of column vectors, so it is not surprising that when we multiply an matrix
by
, we leave the matrix untouched. In terms of the math, we have the following:
Eq. 22
Again, if we multiply them in the opposite order, we also leave the matrix unchanged, so in terms of the math, we have the following:
Eq. 23
The inverse matrix
If the identity matrix leaves an
matrix untouched, we can think of it as the matrix analog of multiplying a number by 1. For any number
on the real number line, we have
and
. The number 1 here is called the identity element. For a number
, we also have the concept of its reciprocal,
, which is the number we multiply
by to get the identity element, so that
is defined by the following relationship:
Eq. 24
For an matrix
, we have an analogous concept – the inverse matrix of
, which is denoted by the symbol
. The matrix
is an
matrix and, as you might have guessed, is defined as the matrix we multiply
by to get the identity element, the matrix
in this case. So,
is defined by the following relationship:
Eq. 25
Conceptually, we can think of as playing a similar role and having similar properties to the reciprocal
in ordinary arithmetic. Just like the reciprocal in ordinary arithmetic, the inverse matrix can be extremely useful in simplifying mathematical expressions by canceling other terms out.
Note that the inverse matrix is only defined for square matrices. Non-square matrices do not have a proper inverse. However, not all square matrices necessarily have an inverse. That is, there are some square matrices, , for which there are no solutions,
, to the relation in Eq. 25. We will talk more about that later when we introduce eigen-decompositions of a square matrix.
More examples of matrices as transformations
Let’s look at another specific example of a matrix and understand its effect as a transformation. Consider the matrix here:
Eq. 26
Clearly matrix operates on two-component vectors that live in a two-dimensional plane. We can think of that plane as being the usual
plane. What transformation does this represent? Let’s break it down. Let’s look at the effect of the transformation represented by
on a specific vector. In this case, we’re going to choose the vector that represents the
axis. In column vector form, this vector is as follows:
Eq. 27
All other vectors representing points on the axis are just multiples of the vector in Eq. 27. Now, what is the effect of
on this vector? It is easy to compute, and we find the following:
Eq. 28
The new vector on the right-hand side of Eq. 28 represents a point in the plane that has identical and positive
and
components. In other words, it represents a 45° anti-clockwise rotation of our starting point, which was on the
axis.
Let’s look at the effect of on another vector. This time we’re going to choose a vector that represents a point on the
axis. In column vector form, this vector is as follows:
Eq. 29
All other vectors representing points on the axis are just multiples of the vector in Eq. 29. The effect of
on this vector is as follows:
Eq. 30
The new vector on the right-hand side of Eq. 30 represents a point in the second quadrant of the (x, y) plane, and again represents a 45° anti-clockwise rotation of our starting point on the axis. The effect of matrix
on the vectors
and
is illustrated schematically in Figure 3.4:

Figure 3.4: Schematic illustration of the effect of matrix A _ _
Now, any two-dimensional vector can be written as a sum of the two vectors we have just studied. To show this, consider the following:
Eq. 31
Given the effect of on both
and
is a 45° anti-clockwise rotation, then the effect of
on any two-dimensional vector will be a 45° anti-clockwise rotation. Therefore, as a transformation,
is a matrix that represents a 45° anti-clockwise rotation.
Since any two-dimensional vector can be written as a linear combination of the vectors and
, these two vectors are called basis vectors – they provide a basis from which we can construct all other two-dimensional vectors. These two vectors are also orthogonal to each other. In geometric terms, this means they are at right-angles to each other – this is obvious in this example because one vector lies along the
axis while the other lies along the
axis. In algebraic terms, orthogonality means the inner product between the two vectors is 0. Basis vectors don’t have to be orthogonal to each other. For example, the two vectors
and
can also be used to describe any point on the
plane. However, when basis vectors are orthogonal, they are easy to work with. Moving along one orthogonal basis vector does not change how far along we are on another orthogonal basis vector. For example, moving along the
axis does not affect where we are on the
axis. This means we can apply calculations along one orthogonal basis vector without having to worry about what is happening in terms of the other basis vectors. This makes orthogonal basis vectors very convenient to work with – a fact we will make use of when we move on to decompositions of matrices in the next section.
Given a set of orthogonal basis vectors in a
-dimensional space, we can easily work out how to represent any vector
in terms of those basis vectors. Say we have a vector
and we want to write it as follows:
Eq. 32
Then, we can work out the values of the weights by taking the inner product of both sides of Eq. 32 with each of the basis vectors
. Doing so, we get the following:
Eq. 33
Since is, by definition, orthogonal to all the other basis vectors except
itself, then the inner products
, unless
. Plugging this fact into the preceding equation, we get the following:
Eq. 34
So, we can easily work out the required weights. If the basis vectors are all of unit length, so that for every value of
, then the expression in Eq. 34 for the weights becomes even easier. It becomes
. A set of orthogonal basis vectors that are of unit length are called orthonormal and form an orthonormal basis. Using an orthonormal basis to represent our vectors is extremely convenient. In the next section of this chapter, we will show how an orthonormal basis can be extracted from any matrix and therefore can be used as an extremely convenient way of working with that matrix. But for now, let’s look at how to do some of those matrix multiplications and transformations in a code example.
Matrix transformation code example
For our code example, we’ll use the in-built functions in the NumPy
package to do this. All the code examples that follow (and additional ones) can be found in the Code_Examples_Chap3.ipynb
Jupyter notebook in the GitHub repository.
First, we’ll use the numpy.matmul
function to multiply two matrices together:
import numpy as np # Create 3x3 matrices A = np.array([[1.0, 2.0, 1.0], [-2.5, 1.0, 0.0], [3.0, 1.0, 1.5]]) B = np.array([[1.0, -1.0, -1.0], [5, 2.0, 3.0], [3.0, 1.0, 2.0]]) # Multiply the matrices together np.matmul(A, B)
The preceding code produces the following output:
array([[14., 4. , 7. ], [2.5, 4.5, 5.5], [12.5, 0.5, 3. ]])
We can use the same NumPy
function to multiply a vector by a matrix:
# Create a 4-dimensional vector a = np.array([1.0, 2.0, 3.0, -2.0]) # Create a 3x4 matrix A = np.array([ [1.0, 1.0, 0.0, 1.0], [-2.0, 2.5, 1.5, 3.0], [0.0, 1.0, 1.0, 4.0]]) # We'll use the matrix multiplication function to calculate # A*a np.matmul(A, a)
We get the following output:
array([ 1. , 1.5, -3. ])
The NumPy
package even has an in-built function for calculating the inverse of a matrix, as the following code demonstrates:
# Create a 4x4 square matrix A = np.array( [[1, 2, 3, 4], [2, 1, 2, 1], [0, 1, 3, 2], [1, 1, 2, 2]]) # Calculate and store the inverse matrix Ainv = np.linalg.inv(A) # Multiply the matrix by its inverse. # We should get the identity matrix #[[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]] # up to numerical precision np.matmul(Ainv, A)
These simple code examples of matrix transformations bring this section neatly to a close, so let’s recap what we have learned in this section.
What we learned
In this section, we have learned the following:
- How to multiply matrices together
- How to multiply a vector by a matrix and vice versa
- What the identity matrix is and its effect on any other matrix
- What the inverse of a matrix is and why it is useful
- How a matrix represents a linear transformation
- How sets of orthonormal vectors provide a convenient basis on which we can express any other vector
Having learned the basics of matrix multiplication and how matrices represent transformations, we’ll now learn some standard ways of representing or decomposing matrices. These decompositions help us to understand in more detail the effect of a matrix and provide convenient ways to work with and manipulate matrices.