In this chapter, we have presented the hierarchical clustering approach, focusing on the different strategies that can be employed (divisive and agglomerative strategies). We also discussed methods that are used to discover which clusters can be merged or split (linkages). In particular, given a distance metric, we analyzed the behavior of four linkage methods: single, complete, average, and Ward's method.
We have shown how to build a dendrogram and how to analyze it in order to understand the entire hierarchical process using different linkage methods. A specific performance measure, called cophenetic correlation, was introduced to evaluate the performance of a hierarchical algorithm without the knowledge of the ground truth.
We analyzed a larger dataset (Water Treatment Plant dataset), defining some hypotheses and validating them using all the tools previously discussed...