You're reading from 15 Math Concepts Every Data Scientist Should Know Understand and learn how to apply the math behind data science algorithms

Product type Paperback

Published in Aug 2024

Publisher Packt

ISBN-13 9781837634187

Length 510 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Science

Author (1):

David Hoyle

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Essential Concepts FREE CHAPTER

2. Chapter 1: Recap of Mathematical Notation and Terminology

3. Chapter 2: Random Variables and Probability Distributions

4. Chapter 3: Matrices and Linear Algebra

5. Chapter 4: Loss Functions and Optimization

6. Chapter 5: Probabilistic Modeling

7. Part 2: Intermediate Concepts

8. Chapter 6: Time Series and Forecasting

9. Chapter 7: Hypothesis Testing

10. Chapter 8: Model Complexity

11. Chapter 9: Function Decomposition

12. Chapter 10: Network Analysis

13. Part 3: Selected Advanced Concepts

14. Chapter 11: Dynamical Systems

15. Chapter 12: Kernel Methods

16. Chapter 13: Information Theory

17. Chapter 14: Non-Parametric Bayesian Methods

18. Chapter 15: Random Matrices

19. Index

Why subscribe?

20. Other Books You May Enjoy

Matrices as transformations

Matrices are typically applied to vectors and other matrices through the process of matrix multiplication. However, the dry mechanics of matrix multiplication tend to hide what a matrix really represents and what matrix multiplication does. We aim to shed light on what matrices really are in this chapter. We’ll start by covering the basics of matrix multiplication and then show how matrices represent transformations.

Matrix multiplication

If we have a matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> of size <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>M</mi><mo>×</mo><mi>K</mi></mrow></mrow></math> , and a matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> of size <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>K</mi><mo>×</mo><mi>N</mi></mrow></mrow></math> , then we can multiply those two matrices together to get a new matrix , which is of size . The matrix element <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math> is calculated as follows:

Eq. 10

In this example, we are multiplying a <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>K</mi><mo>×</mo><mi>N</mi></mrow></mrow></math> matrix by an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:math> matrix. Schematically, we can write this as follows:

Eq. 11

From this, it is clear that the “inner” dimensions in this example match, both being <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>K</mml:mi></mml:math> . To multiply two matrices together, the inner dimensions must match, when we write the multiplication out in this schematic way. If the dimensions do not match, we cannot multiply the matrices together. For example, we cannot multiply a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mn>10</mml:mn><mml:mo>×</mml:mo><mml:mn>4</mml:mn></mml:math> matrix by a matrix.

The inner product as matrix multiplication

In fact, if we think of a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi></mml:math> -dimensional column vector as a matrix, and its transpose, a <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi mathvariant="normal">⊤</mi></mrow></math> ,which is a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi></mml:math> -dimensional row vector, as a matrix, then multiplying those two matrices together as a <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi mathvariant="normal">⊤</mi></mrow></math> a gives a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mn>1</mml:mn><mml:mo>×</mml:mo><mml:mn>1</mml:mn></mml:math> result, or in other words, a scalar. The value of that scalar is calculated using the right-hand side of Eq. 10 for matrix multiplication and gives the same calculation as Eq. 2 for the inner product. In other words, we have the following:

Eq. 12

More generally, if we have two vectors <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and of the same length, then <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msup><munder><mi>a</mi><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup><munder><mi>b</mi><mo stretchy="true">_</mo></munder><mo>=</mo><munder><mi>a</mi><mo stretchy="true">_</mo></munder><mo>⋅</mo><munder><mi>b</mi><mo stretchy="true">_</mo></munder></mrow></mrow></math> . Let’s look at the last formula more schematically, as shown in Figure 3.2:

Figure 3.2: Matrix multiplication of a row vector and column vector is the same as the inner product

The left-hand side of the schematic equation in Figure 3.2 is a matrix multiplication, but if we calculate that matrix multiplication by hand, we get the expression on the right-hand side of the figure, which is just the inner product <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder><mml:mo>⋅</mml:mo><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> .

Matrix multiplication as a series of inner products

We can extend this connection between inner products and matrix multiplication by looking again at the right-hand side of Eq. 10. The matrix element <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math> looks like an inner product between a vector formed from the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math> row of matrix and the column of matrix . That means we can represent the matrix multiplication <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><munder><munder><mi>B</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></mrow></math> schematically as follows:

Figure 3.3: Matrix elements resulting from a matrix multiplication can be viewed as inner product calculations

In fact, this schematic is how I remember how to do matrix multiplication, not the dry formula given in Eq. 10.

Matrix multiplication is not commutative

The matrix multiplication <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><munder><munder><mi>B</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></mrow></math> is, of course, the matrix counterpart of the ordinary multiplication of real numbers that we are familiar with from school. Even the notation <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><munder><munder><mi>B</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></mrow></math> gives the impression that matrix multiplication will follow the same rules and patterns as ordinary multiplication. This is not the case. There are some subtleties and nuances with matrix multiplication. One of these subtleties to be aware of is that the order of the matrices matters. In general, for two different matrices <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , we have . We say that matrix multiplication is not commutative.

To see this more concretely, let’s take an explicit example. Consider these two matrices:

Eq. 13

You can confirm for yourself, by doing the matrix multiplications by hand, that the following apply:

Eq. 14

Obviously, there are special cases where matrix multiplication does commute, for example, the trivial case when , in which case . There are also cases when <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> are different and yet we have . These special cases require extra conditions on the properties of the matrices <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , but for now, we can say that in general, , so be careful you don’t assume it.

The outer product as a matrix multiplication

Just as we showed that the inner product between two vectors could be written as a matrix multiplication, we can do the same for calculating the outer product between two vectors. If we have an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi></mml:math> -component real-valued column vector , we can think of it as an <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>M</mi><mo>×</mo><mn>1</mn></mrow></mrow></math> matrix. Likewise, if we have an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi></mml:math> -component real-valued row vector b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi mathvariant="normal">⊤</mi></mrow></math> then we can think of it as a matrix. We can then multiply these two matrices together to get a b. From the rules of matrix multiplication, this is an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix whose matrix element is given by , which is the same as we get when we calculate the outer product, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder><mml:mo>⊗</mml:mo><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> between the vectors <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><munder><mi>b</mi><mo stretchy="true">_</mo></munder></mrow></math> . Because of this, we almost always use the more succinct notation a bto denote the outer product <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder><mml:mo>⊗</mml:mo><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> when <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><munder><mi>b</mi><mo stretchy="true">_</mo></munder></mrow></math> are real-valued. Schematically, we have the following:

Eq. 15

You’ll recall that we could also calculate the outer product, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder><mml:mo>⊗</mml:mo><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , from the vectors <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and . You will have guessed that we can also write this outer product as the matrix multiplication <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><munder><mi>b</mi><mo stretchy="true">_</mo></munder><msup><munder><mi>a</mi><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup></mrow></mrow></math> when <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and are real-valued. Again, this notation is more commonly used to represent the outer product, rather than <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder><mml:mo>⊗</mml:mo><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> .

Multiplying multiple matrices together

Once we know how to multiply two matrices together, it is a simple matter to multiply many matrices together – we simply take them two at a time. For example, if we have <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrices , , and , then their product can be calculated via the following:

Eq. 16

We can either multiply <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> together first and then multiply the result by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , or we can multiply <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> together first and then use the result to multiply <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> . Either way, we get the same result. This means matrix multiplication is associative.

Transforming a vector by matrix multiplication

So far, we have learned about vectors and matrices and their basic properties. We have also learned how to multiply matrices together. We have even seen how we can consider a vector as a special kind of matrix. This immediately raises the question of what happens if we multiply a vector by a matrix – what do we get and what does that multiplication represent?

Consider an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix and an -component column vector . As we can think of the vector as an matrix, we can clearly multiply <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> by using the rules of matrix multiplication. In fact, we get the following:

Eq. 17

The result is an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mn>1</mml:mn></mml:math> matrix, that is, an -component column vector. So, multiplying a vector by a matrix gives us another vector. The components of this new vector are given by the expressions inside the brackets on the right-hand side of Eq. 17. The components of this new vector are (in general) different from those of vector <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , so the effect of multiplying a vector by a matrix is to transform that vector. From this, we can conclude that matrices represent transformations.

If we look more closely at the individual expressions in the vector on the right-hand side of Eq. 17, we can see that each component in the new vector is a linear combination of the components in the old vector <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> . So, the matrix represents a linear transformation. The individual matrix elements <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msub><mi>A</mi><mn>11</mn></msub><mo>,</mo><msub><mi>A</mi><mn>12</mn></msub></mrow></mrow></math> , and so on tell us the weights in those linear combinations that give us the components of the new vector. In other words, the individual matrix elements encode the details of the linear transformation.

One thing we haven’t spoken about yet is what effect the relative sizes of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi></mml:math> and has. If , then obviously, multiplying an -component vector by the matrix gives us another -component vector. Although we have transformed the vector, we have, in this case, stayed within the same <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>N</mi></mrow></math> -dimensional space. However, if <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo><</mml:mo><mml:mi>N</mml:mi></mml:math> , then our new vector has fewer components than the starting vector , and so we have reduced the dimensionality. Alternatively, if <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi><mml:mo>></mml:mo><mml:mi>N</mml:mi></mml:math> , our new vector has more components than we started with, and we have increased the dimensionality.

In all the examples previously, we have been multiplying a column vector by a matrix. But we can equally multiply a matrix by a row vector. Let’s stick with our vector <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> but we will use its row vector form . Now we can think of as a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mn>1</mml:mn><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix. So, if we have an matrix , then we can perform the matrix multiplication , and we get a matrix out of it, that is, an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>M</mml:mi></mml:math> -component row vector. As you might expect, this new -component vector is just a linear transformation of our starting -component vector , with the details of the linear transformation encoded in the matrix elements <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math> .

Finally, we should highlight that since matrix multiplication is a linear transformation, it means that if we apply a matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> to a combination of vectors, the result is the same as combining the results of applying <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> to each vector individually. In more detail, we have the following:

Eq. 18

We will make use of this fact shortly.

The identity matrix

Now that we have learned that matrix multiplication represents the linear transformation of vectors, let’s look at some particular special cases of transformations. Consider the <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>N</mi><mo>×</mo><mi>N</mi></mrow></mrow></math> matrix given here:

Eq. 19

The matrix has 1 for each matrix element along its diagonal and 0 everywhere else. Now, what is the effect of multiplying by ? Let’s try it. Consider an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi></mml:math> -component column vector . If we multiply by , we get the result shown here:

Eq. 20

So, multiplying any vector <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> by just gives us back itself. We haven’t done anything to the starting vector. The transformation represented by is just the identity transformation, which leaves vectors untouched. Hence, is called the identity matrix. Or, more specifically, it is the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi></mml:math> -component identity matrix because it operates on -component vectors.

It is a simple matter to confirm, via a similar calculation to the previous one, that if we reverse the order of the calculation, so that we multiply by a row vector <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mrow><mml:mi mathvariant="normal">⊤</mml:mi></mml:mrow></mml:msup></mml:math> , we leave the row vector unchanged. In terms of math notation, we have the following:

Eq. 21

Now remember that when we explained matrix multiplication as a series of inner products, we learned that we could think of a matrix as a set of column vectors, so it is not surprising that when we multiply an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix by , we leave the matrix untouched. In terms of the math, we have the following:

Eq. 22

Again, if we multiply them in the opposite order, we also leave the matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> unchanged, so in terms of the math, we have the following:

Eq. 23

The inverse matrix

If the identity matrix leaves an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix untouched, we can think of it as the matrix analog of multiplying a number by 1. For any number on the real number line, we have <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mi>a</mi><mo>×</mo><mn>1</mn><mo>=</mo><mi>a</mi></mrow></mrow></math> and . The number 1 here is called the identity element. For a number <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>a</mml:mi></mml:math> , we also have the concept of its reciprocal, , which is the number we multiply by to get the identity element, so that is defined by the following relationship:

Eq. 24

For an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix , we have an analogous concept – the inverse matrix of , which is denoted by the symbol . The matrix is an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:math> matrix and, as you might have guessed, is defined as the matrix we multiply by to get the identity element, the matrix in this case. So, is defined by the following relationship:

Eq. 25

Conceptually, we can think of as playing a similar role and having similar properties to the reciprocal <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math> in ordinary arithmetic. Just like the reciprocal in ordinary arithmetic, the inverse matrix can be extremely useful in simplifying mathematical expressions by canceling other terms out.

Note that the inverse matrix is only defined for square matrices. Non-square matrices do not have a proper inverse. However, not all square matrices necessarily have an inverse. That is, there are some square matrices, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> , for which there are no solutions, , to the relation in Eq. 25. We will talk more about that later when we introduce eigen-decompositions of a square matrix.

More examples of matrices as transformations

Let’s look at another specific example of a matrix and understand its effect as a transformation. Consider the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mn>2</mml:mn><mml:mo>×</mml:mo><mml:mn>2</mml:mn></mml:math> matrix here:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mo>=</mo><mfenced open="(" close=")"><mtable columnspacing="0.8000em" columnwidth="auto auto" columnalign="center center" rowspacing="1.0000ex" rowalign="baseline baseline"><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mrow><mo>−</mo><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mrow></mtd></mtr><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd></mtr></mtable></mfenced></mrow></mrow></math>$

Eq. 26

Clearly matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> operates on two-component vectors that live in a two-dimensional plane. We can think of that plane as being the usual <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:math> plane. What transformation does this represent? Let’s break it down. Let’s look at the effect of the transformation represented by <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> on a specific vector. In this case, we’re going to choose the vector that represents the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>x</mml:mi></mml:math> axis. In column vector form, this vector is as follows:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mfenced open="(" close=")"><mfrac><mn>1</mn><mn>0</mn></mfrac></mfenced></mrow></math>$

Eq. 27

All other vectors representing points on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>x</mml:mi></mml:math> axis are just multiples of the vector in Eq. 27. Now, what is the effect of on this vector? It is easy to compute, and we find the following:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mfenced open="(" close=")"><mfrac><mn>1</mn><mn>0</mn></mfrac></mfenced><mo>=</mo><mfenced open="(" close=")"><mtable columnspacing="0.8000em" columnwidth="auto auto" columnalign="center center" rowspacing="1.0000ex" rowalign="baseline baseline"><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mrow><mo>−</mo><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mrow></mtd></mtr><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd></mtr></mtable></mfenced><mo>×</mo><mfenced open="(" close=")"><mfrac><mn>1</mn><mn>0</mn></mfrac></mfenced><mo>=</mo><mfenced open="(" close=")"><mfrac><mstyle scriptlevel="+1"><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mstyle><mstyle scriptlevel="+1"><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mstyle></mfrac></mfenced></mrow></mrow></math>$

Eq. 28

The new vector on the right-hand side of Eq. 28 represents a point in the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:math> plane that has identical and positive and components. In other words, it represents a 45° anti-clockwise rotation of our starting point, which was on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>x</mml:mi></mml:math> axis.

Let’s look at the effect of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> on another vector. This time we’re going to choose a vector that represents a point on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>y</mml:mi></mml:math> axis. In column vector form, this vector is as follows:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mfenced open="(" close=")"><mfrac><mn>0</mn><mn>1</mn></mfrac></mfenced></mrow></math>$

Eq. 29

All other vectors representing points on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>y</mml:mi></mml:math> axis are just multiples of the vector in Eq. 29. The effect of on this vector is as follows:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munder><munder><mi>A</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mspace width="0.25em" /><mfenced open="(" close=")"><mfrac><mn>0</mn><mn>1</mn></mfrac></mfenced><mo>=</mo><mfenced open="(" close=")"><mtable columnspacing="0.8000em" columnwidth="auto auto" columnalign="center center" rowspacing="1.0000ex" rowalign="baseline baseline"><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mrow><mo>−</mo><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mrow></mtd></mtr><mtr><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd><mtd><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mtd></mtr></mtable></mfenced><mo>×</mo><mfenced open="(" close=")"><mfrac><mn>0</mn><mn>1</mn></mfrac></mfenced><mo>=</mo><mfenced open="(" close=")"><mfrac><mrow><mo>−</mo><mstyle scriptlevel="+1"><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mstyle></mrow><mstyle scriptlevel="+1"><mfrac><mn>1</mn><msqrt><mn>2</mn></msqrt></mfrac></mstyle></mfrac></mfenced></mrow></mrow></math>$

Eq. 30

The new vector on the right-hand side of Eq. 30 represents a point in the second quadrant of the (x, y) plane, and again represents a 45° anti-clockwise rotation of our starting point on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>y</mml:mi></mml:math> axis. The effect of matrix on the vectors and is illustrated schematically in Figure 3.4:

Figure 3.4: Schematic illustration of the effect of matrix A _ _

Now, any two-dimensional vector can be written as a sum of the two vectors we have just studied. To show this, consider the following:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mfenced open="(" close=")"><mfrac><mi>x</mi><mi>y</mi></mfrac></mfenced><mo>=</mo><mfenced open="(" close=")"><mfrac><mi>x</mi><mn>0</mn></mfrac></mfenced><mo>+</mo><mfenced open="(" close=")"><mfrac><mn>0</mn><mi>y</mi></mfrac></mfenced><mo>=</mo><mi>x</mi><mfenced open="(" close=")"><mfrac><mn>1</mn><mn>0</mn></mfrac></mfenced><mo>+</mo><mi>y</mi><mfenced open="(" close=")"><mfrac><mn>0</mn><mn>1</mn></mfrac></mfenced></mrow></mrow></math>$

Eq. 31

Given the effect of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> on both $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ and $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ is a 45° anti-clockwise rotation, then the effect of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> on any two-dimensional vector will be a 45° anti-clockwise rotation. Therefore, as a transformation, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> is a matrix that represents a 45° anti-clockwise rotation.

Since any two-dimensional vector can be written as a linear combination of the vectors $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ and $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ , these two vectors are called basis vectors – they provide a basis from which we can construct all other two-dimensional vectors. These two vectors are also orthogonal to each other. In geometric terms, this means they are at right-angles to each other – this is obvious in this example because one vector lies along the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>x</mml:mi></mml:math> axis while the other lies along the axis. In algebraic terms, orthogonality means the inner product between the two vectors is 0. Basis vectors don’t have to be orthogonal to each other. For example, the two vectors $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ and $<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mfenced separators="|"><mml:mrow><mml:mfrac linethickness="0pt"><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced></mml:math>$ can also be used to describe any point on the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:math> plane. However, when basis vectors are orthogonal, they are easy to work with. Moving along one orthogonal basis vector does not change how far along we are on another orthogonal basis vector. For example, moving along the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>x</mml:mi></mml:math> axis does not affect where we are on the axis. This means we can apply calculations along one orthogonal basis vector without having to worry about what is happening in terms of the other basis vectors. This makes orthogonal basis vectors very convenient to work with – a fact we will make use of when we move on to decompositions of matrices in the next section.

Given a set of orthogonal basis vectors <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mn>1</mn></msub><mo>,</mo><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mn>2</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>d</mi></msub></mrow></mrow></math> in a <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi></mml:math> -dimensional space, we can easily work out how to represent any vector in terms of those basis vectors. Say we have a vector and we want to write it as follows:

Eq. 32

Then, we can work out the values of the weights <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msub><mi>α</mi><mn>1</mn></msub><mo>,</mo><msub><mi>α</mi><mn>2</mn></msub><mo>,</mo><mo>⋯</mo><mo>,</mo><msub><mi>α</mi><mi>d</mi></msub></mrow></mrow></math> by taking the inner product of both sides of Eq. 32 with each of the basis vectors <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mn>1</mn></msub><mo>,</mo><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mn>2</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>d</mi></msub></mrow></mrow></math> . Doing so, we get the following:

Eq. 33

Since <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math> is, by definition, orthogonal to all the other basis vectors except <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math> itself, then the inner products , unless <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mi>i</mml:mi></mml:math> . Plugging this fact into the preceding equation, we get the following:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msup><munder><mi>a</mi><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi></msub><mo>=</mo><msub><mi mathvariant="normal">α</mi><mi>i</mi></msub><msubsup><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi><mi mathvariant="normal">⊤</mi></msubsup><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi></msub><mspace width="0.25em" /><mspace width="0.25em" /><mspace width="0.25em" /><mo>⇒</mo><mspace width="0.25em" /><mspace width="0.25em" /><mspace width="0.25em" /><msub><mi mathvariant="normal">α</mi><mi>i</mi></msub><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfrac><mrow><msup><munder><mi>a</mi><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi></msub></mrow><mrow><msubsup><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi><mi mathvariant="normal">⊤</mi></msubsup><msub><munder><mi>v</mi><mo stretchy="true">_</mo></munder><mi>i</mi></msub></mrow></mfrac></mrow></mrow></math>$

Eq. 34

So, we can easily work out the required weights. If the basis vectors are all of unit length, so that for every value of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>i</mml:mi></mml:math> , then the expression in Eq. 34 for the weights becomes even easier. It becomes . A set of orthogonal basis vectors that are of unit length are called orthonormal and form an orthonormal basis. Using an orthonormal basis to represent our vectors is extremely convenient. In the next section of this chapter, we will show how an orthonormal basis can be extracted from any matrix and therefore can be used as an extremely convenient way of working with that matrix. But for now, let’s look at how to do some of those matrix multiplications and transformations in a code example.

Matrix transformation code example

For our code example, we’ll use the in-built functions in the NumPy package to do this. All the code examples that follow (and additional ones) can be found in the Code_Examples_Chap3.ipynb Jupyter notebook in the GitHub repository.

First, we’ll use the numpy.matmul function to multiply two matrices together:

import numpy as np
# Create 3x3 matrices
A = np.array([[1.0, 2.0, 1.0], [-2.5, 1.0, 0.0], [3.0, 1.0, 1.5]])
B = np.array([[1.0, -1.0, -1.0], [5, 2.0, 3.0], [3.0, 1.0, 2.0]])
# Multiply the matrices together
np.matmul(A, B)

The preceding code produces the following output:

array([[14.,  4. ,  7. ],
       [2.5,  4.5,  5.5],
       [12.5, 0.5,  3. ]])

We can use the same NumPy function to multiply a vector by a matrix:

# Create a 4-dimensional vector
a = np.array([1.0, 2.0, 3.0, -2.0])
# Create a 3x4 matrix
A = np.array([
    [1.0, 1.0, 0.0, 1.0], [-2.0, 2.5, 1.5, 3.0], 
    [0.0, 1.0, 1.0, 4.0]])
# We'll use the matrix multiplication function to calculate # A*a
np.matmul(A, a)

We get the following output:

array([ 1. ,  1.5, -3. ])

The NumPy package even has an in-built function for calculating the inverse of a matrix, as the following code demonstrates:

# Create a 4x4 square matrix
A = np.array( [[1, 2, 3, 4],
               [2, 1, 2, 1],
               [0, 1, 3, 2],
               [1, 1, 2, 2]])
# Calculate and store the inverse matrix
Ainv = np.linalg.inv(A)
# Multiply the matrix by its inverse.
# We should get the identity matrix 
#[[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]]
# up to numerical precision
np.matmul(Ainv, A)

These simple code examples of matrix transformations bring this section neatly to a close, so let’s recap what we have learned in this section.

What we learned

In this section, we have learned the following:

How to multiply matrices together
How to multiply a vector by a matrix and vice versa
What the identity matrix is and its effect on any other matrix
What the inverse of a matrix is and why it is useful
How a matrix represents a linear transformation
How sets of orthonormal vectors provide a convenient basis on which we can express any other vector

Having learned the basics of matrix multiplication and how matrices represent transformations, we’ll now learn some standard ways of representing or decomposing matrices. These decompositions help us to understand in more detail the effect of a matrix and provide convenient ways to work with and manipulate matrices.