Exploring the data nature by the t-SNE method
After visualizing a few images and glimpsing of how the samples are distributed, we will go deeper into our EDA.
Each pixel comes with an intensity value, which makes 64 variables for each 8x8 image. The human brain is not good at intuitively perceiving dimensions higher than three. For high-dimensional data, we need more effective visual aids.
Dimensionality reduction methods, such as the commonly used PCA and t-SNE, reduce the number of input variables under consideration, while retaining most of the useful information. As a result, the visualization of data becomes more intuitive.
In the following section, we will focus our discussion on the t-SNE method by using the scikit-learn library in Python.
Understanding t-Distributed stochastic neighbor embedding
The t-SNE method was proposed by van der Maaten and Hinton in 2008 in the publication Visualizing Data using t-SNE. It is a nonlinear dimension reduction method that aims to effectively visualize...