Linear Discriminant Analysis for Feature Reduction
Linear discriminant analysis (LDA) helps in maximizing the class separation by projecting the data into a new feature space: lower dimensional space with good class separability in order to avoid overfitting (curse of dimensionality). LDA also reduces computational costs, which makes it suitable as a classification algorithm. The idea is to maximize the distance between the mean of each class (or category) and minimize the variability within the class. (This sounds certainly like how the clustering algorithm in unsupervised learning works, but we will not touch that here as it is not in the scope of this book.) Note that LDA assumes that data follows a Gaussian distribution; if it's not, the performance of LDA will be reduced. In this section, we will use LDA as a feature reduction technique rather than as a classifier.
For the two-class problem, if we have an m-dimensional dataset with N observations, of which belongs to class and belongs...