Principal component analysis (PCA) is a way to reduce the number of dimensions in a dataset. We can think of it as a way of compressing a dataset. Suppose you have 100 different variables in your dataset. It may be the case that many of these variables are correlated with each other. If this is the case, then it is possible to explain most of the variation in the data by combining variables to build a smaller set of data. PCA performs this task: it tries to find linear combinations of your input variables, and reports how much variation is explained by each combination.
PCA is a method for reducing the dimensions in a dataset: in effect, summarizing it so that you can focus on the most important features, which explain most of the variation in the dataset.
PCA can be useful for machine learning in two ways:
- It can be a useful preprocessing step before...