Variable Clustering
Variable clustering is used for measuring collinearity, calculating redundancy, and for separating variables into clusters that can be counted as a single variable, thus resulting in data reduction. Hierarchical cluster analysis on variables uses any one of the following: Hoeffding's D statistics, squared Pearson or Spearman correlations, or uses as a similarity measure the proportion of observations for which two variables are both positive. The idea is to find the cluster of correlated variables that are correlated with themselves and not with variables in another cluster. This reduces a large number of features into a smaller number of features or variable clusters.
Exercise 86: Using Variable Clustering
In this exercise, we will use feature clustering for identifying a cluster of similar features. From each cluster, we can select one or more features for the model. We will use the hierarchical cluster algorithm from the Hmisc package in R. The similarity measure should...