You're reading from 15 Math Concepts Every Data Scientist Should Know Understand and learn how to apply the math behind data science algorithms

Product type Paperback

Published in Aug 2024

Publisher Packt

ISBN-13 9781837634187

Length 510 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Science

Author (1):

David Hoyle

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Essential Concepts FREE CHAPTER

2. Chapter 1: Recap of Mathematical Notation and Terminology

3. Chapter 2: Random Variables and Probability Distributions

4. Chapter 3: Matrices and Linear Algebra

5. Chapter 4: Loss Functions and Optimization

6. Chapter 5: Probabilistic Modeling

7. Part 2: Intermediate Concepts

8. Chapter 6: Time Series and Forecasting

9. Chapter 7: Hypothesis Testing

10. Chapter 8: Model Complexity

11. Chapter 9: Function Decomposition

12. Chapter 10: Network Analysis

13. Part 3: Selected Advanced Concepts

14. Chapter 11: Dynamical Systems

15. Chapter 12: Kernel Methods

16. Chapter 13: Information Theory

17. Chapter 14: Non-Parametric Bayesian Methods

18. Chapter 15: Random Matrices

19. Index

Why subscribe?

20. Other Books You May Enjoy

Random matrices and high-dimensional covariance matrices

The examples of large random matrices in the previous section were all square matrices. However, in real-world data science, not all matrices are square. Take the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> data matrix that we encountered in Chapter 3 when doing Principal Component Analysis (PCA). It is an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>d</mml:mi></mml:math> matrix, where is the number of data points and is the number of features. We will assume, for this section, that the data has already been mean-centered, so that the sum of each column of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> is 0.

The <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> matrix is what we use to do PCA. It is also the design matrix that we use when building statistical models. So, the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> matrix is non-square (unless <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:math> . However, in practice, we usually derive a square matrix from . For example, when doing PCA, we would calculate the sample covariance matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mover accent="true"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math> , which is defined as follows:

$<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mover><munder><munder><mi>C</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mo stretchy="true">ˆ</mo></mover><mo>=</mo><mfrac><mn>1</mn><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></mfrac><msup><munder><munder><mi>X</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup><munder><munder><mi>X</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></mrow></math>$

Eq.10

The <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mover accent="true"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math> matrix in Eq.10 is <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi><mml:mo>×</mml:mo><mml:mi>d</mml:mi></mml:math> and symmetric. If we had many features, it would be a large matrix. Since <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mover><munder><munder><mi>C</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mo stretchy="true">ˆ</mo></mover></mrow></math> is derived from our data, which contains...