Random matrices and high-dimensional covariance matrices
The examples of large random matrices in the previous section were all square matrices. However, in real-world data science, not all matrices are square. Take the data matrix that we encountered in Chapter 3 when doing Principal Component Analysis (PCA). It is an matrix, where is the number of data points and is the number of features. We will assume, for this section, that the data has already been mean-centered, so that the sum of each column of is 0
.
The matrix is what we use to do PCA. It is also the design matrix that we use when building statistical models. So, the matrix is non-square (unless . However, in practice, we usually derive a square matrix from . For example, when doing PCA, we would calculate the sample covariance matrix , which is defined as follows:
Eq.10
The matrix in Eq.10 is and symmetric. If we had many features, it would be a large matrix. Since is derived from our data, which contains...