The divergences
Fundamentally,ces are algorithms that compute the similarity between two probability distributions. In the field of information theory, divergences are used to estimate the minimum discrimination information.
Although divergences are not usually defined as dimension-reduction techniques, they are a vital tool for measuring the redundancy of information between features.
Let's consider a set of observations: X with a feature set {fi}. Two features that are highly correlated generate redundant information (or information gains). Therefore, it is conceivable to remove one of these two features from the training set without incurring a loss of information.
The list of divergences is quite extensive and includes the following methods:
- Kullback-Leibler (KL) divergence estimates the similarity between two probability distributions [5:1]
- Jensen-Shannon metric extends the KL formula with symmetrization and boundary values [5:2]
- Mutual information, based on KL, measures the mutual...