Hierarchical clustering
Next, we'll examine hierarchical clustering. We already saw this used by pandas-profiling reports in one of the missing value plots. Hierarchical clustering can operate in a bottom-up or top-down approach. The bottom-up approach starts with each point in its own cluster and joins the closest points into clusters based on distance metrics until all points are in one cluster.
The top-down approach starts with all points in one cluster and splits them until all points are in their own cluster.
We can choose a point along these paths that will give us a set of clusters. Let's look at using the sklearn
implementation, which uses a bottom-up approach:
from sklearn.cluster import AgglomerativeClustering
ac = AgglomerativeClustering(n_clusters=3)
ac.fit(scaled_df)
This sklearn
class works almost the same as the k-means clustering algorithm, with a primary hyperparameter n_clusters
. As with k-means and other distance-based algorithms, it...