Choosing the number of principal components
Not all principal components provide valuable information. This means we don’t need to keep all components to get a good representation of our data; we can keep just a few. We can use a scree plot to get a sense of the most useful components. The scree plot plots the components against the proportion of explained variance of each component – that is, the amount of information each component holds in relation to the original variables. The proportion of explained variance is the variance of each component over the sum of the variance from all components. Typically, the higher, the better because a higher proportion means the component will provide a good representation of the original variables. A cumulative explained variance of about 75% is a good target to aim for.
The variance of a component is derived from the eigenvalue of that component. In simple terms, the eigenvalues give us a sense of how much variance (information...