An application to real-world data
In this section, we will apply PCA to the MNIST dataset. The MNIST dataset is one of the most famous datasets in machine learning and contains handwritten digits that are used to train image processing algorithms. We will be using version 1 of the dataset, where each picture of every digit has 784 features. We will transform these features into a 28 x 28 matrix for visualization purposes. Each element of this matrix is a number between 0 (white) and 255 (black).
The first step is to import the data as shown in the following code. It is going to take some time since it is a big dataset, so hang tight. The dataset contains images of 70,000 digits (0-9), and each image has 784 features:
#Importing the dataset from sklearn.datasets import fetch_openml mnist_data = fetch_openml('mnist_784', version = 1) # Choosing the independent (X) and dependent variables (y) X,y = mnist_data["data"], mnist_data["target"]
Now...