The scikit-learn implementation of PCA
In this section, we will apply PCA to the pizza.csv
dataset (which we explored in the first section of this chapter) using the scikit-learn library's PCA
class.
As discussed in the previous section, there are two ways of choosing how many principal components to use, and the choice depends on the goal that you are trying to achieve – whether to reduce the dimensionality to plot something in 2-dimensional/3-dimensional space or keep enough principal components to achieve a certain proportion of variance.
First, we will implement the method where we can select the number of principal components we want to keep. We will reduce the 7-dimensional pizza dataset to two principal components so that we can visualize how the different pizzas produced by 10 different companies are different from each other when it comes to their nutritional content in a 2D plot instead of worrying about comparing and visualizing data in higher dimensions...