4. Dimensionality Reduction Techniques and PCA
Activity 4.01: Manual PCA versus scikit-learn
Solution:
- Import the
pandas
,numpy
, andmatplotlib
plotting libraries and the scikit-learnPCA
model:import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition import PCA
- Load the dataset and select only the sepal features as per the previous exercises. Display the first five rows of the data:
df = pd.read_csv('../Seed_Data.csv') df = df[['A', 'LK']] df.head()
The output is as follows:
- Compute the
covariance
matrix for the data:cov = np.cov(df.values.T) cov
The output is as follows:
array([[8.46635078, 1.22470367], Â Â Â Â Â Â Â [1.22470367, 0.19630525]])
- Transform the data using the scikit-learn API and only the first principal component. Store the transformed data in the
sklearn_pca
variable:model = PCA(n_components=1) sklearn_pca...