Principal components analysis (PCA)
The principal components analysis transforms an original set of features into a new set of features ordered by decreasing value of their variance. PCA enables the data scientist to select the features that have the most impact on the classification or prediction (features with the higher variance).
The original observations (vectors of feature instance) are transformed into a set of variables with a lower degree of correlation.
Let's consider a model with two features {x, y} and a set of observations {xi, yi} plotted in the following chart:
The features x and y are converted into two variables, X and Y (that is rotation), to appropriately match the distribution of observations. The variable with the highest variance is known as the first principal component. In the generic case of multiple features, the variable with the n th highest variance is known as the n th principal component. The...