Grouping data using agglomerative clustering
Before we talk about agglomerative clustering, we need to understand hierarchical clustering. Hierarchical clustering refers to a set of clustering algorithms that build tree-like clusters by successively splitting or merging them. This hierarchical structure is represented using a tree.
Hierarchical clustering algorithms can be either bottom-up or top-down. Now what does this mean? In bottom-up algorithms, each datapoint is treated as a separate cluster with a single object. These clusters are then successively merged until all the clusters are merged into a single giant cluster. This is called agglomerative clustering. On the other hand, top-down algorithms start with a giant cluster and successively split these clusters until individual datapoints are reached. You can learn more about it at http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html.
How to do it…
The full code for this recipe is given in the...