Implementing principal component analysis on multiple variables
Principal Components Analysis (PCA) is a popular dimensionality reduction method that is used to reduce the dimension of very large datasets. It does this by combining multiple variables into new variables called principal components. These components are typically independent of each other and contain valuable information from the original variables.
Even though PCA provides a simple way to analyze large datasets, accuracy is a trade-off. PCA doesn’t provide an exact representation of the original data, but it tries to preserve as much valuable information as possible. This means that, most times, it produces an output close enough for us to glean insights from.
Now, we will explore how to implement PCA using the sklearn
library.
Getting ready
We will work with the Customer Personality Analysis data from Kaggle on this recipe. You can retrieve all the files from the GitHub repository.