t-Distributed SNE
t-SNE aims to address the crowding problem using a modified version of the KL divergence cost function and by substituting the Gaussian distribution with the Student's t-distribution in the low-dimensional space. The Student's t-distribution is a probability distribution much like Gaussian and is used when we have a small sample size and unknown population standard deviation. It is often used in the Student's t-test.
The modified KL cost function considers the pairwise distances in the low-dimensional space equally, while the Student's distribution employs a heavy tail in the low-dimensional space to avoid the crowding problem. In the higher-dimensional probability calculation, the Gaussian distribution is still used to ensure that a moderate distance in the higher dimensions is still represented as such in the lower dimensions. This combination of different distributions in the respective spaces allows for the faithful representation of datapoints...