Stochastic Neighbor Embedding (SNE)
SNE is one of a number of different methods that fall within the category of manifold learning, which aims to describe high-dimensional spaces within low-dimensional manifolds or bounded areas. At first thought, this seems like an impossible task; how can we reasonably represent data in two dimensions if we have a dataset with at least 30 features? As we work through the derivation of SNE, it is hoped that you will see how this is possible. Don't worry – we will not be covering the mathematical details of this process in great depth as it is outside of the scope of this chapter. Constructing an SNE can be divided into the following steps:
- Convert the distances between datapoints in the high-dimensional space into conditional probabilities. Say we had two points, xi and xj, in a high-dimensional space and we wanted to determine the probability (pi|j) that xj would be picked as a neighbor of xi. To define this probability, we use...