6. t-Distributed Stochastic Neighbor Embedding
Activity 6.01: Wine t-SNE
Solution:
- Import
pandas
,numpy
, andmatplotlib
, as well as thet-SNE
andPCA
models from scikit-learn:import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition import PCA from sklearn.manifold import TSNE
- Load the Wine dataset using the
wine.data
file included in the accompanying source code and display the first five rows of data:df = pd.read_csv('wine.data', header=None) df.head()
The output is as follows:
- The first column contains the labels; extract this column and remove it from the dataset:
labels = df[0] del df[0]
- Execute PCA to reduce the dataset to the first six components:
model_pca = PCA(n_components=6) wine_pca = model_pca.fit_transform(df)
- Determine the amount of variance within the data described by these six components:
np.sum(model_pca.explained_variance_ratio_)
The output...