Principal Component Analysis
A well-known algorithm to extract features from high-dimensional data for consumption in machine learning (ML) models is Principal Component Analysis (PCA). In mathematical terms, dimension is the minimum number of coordinates required to specify a vector in space. A lot of computational power is needed to find the distance between two vectors in high-dimensional space and in such cases, dimension is considered a curse. An increase in dimension will result in high performance of the algorithm only to a certain extent and will drop beyond that. This is the curse of dimensionality, as shown in Figure 3.1, which impedes the achievement of efficiency for most ML algorithms. The variable columns or features in data represent dimensions of space and the rows represent the coordinates in that space. With the increasing dimension of data, sparsity increases and there is an exponentially increasing computational effort required to calculate distance and density...