Summary
As we've seen, unsupervised learning can be a useful technique for uncovering patterns in data without the need for labels or targets. We saw how k-means and hierarchical clustering can deliver similar results, and how different metrics such as the within-cluster sum-of-squares (WCSS) and the silhouette score can be used to optimize the number of neighbors for k-means and hierarchical clustering. With the WCSS metric, we can use an elbow plot and find the point of maximum curvature on the plot, called the elbow, in order to find the optimal value of n_clusters
.
The silhouette plot was demonstrated as another way to evaluate the quality of the clustering fit. We also saw how to create visualizations of clusters and look at summary statistics for clusters to understand what the clustering results mean. Lastly, we looked at how DBSCAN works and one method for deciding on the best eps
and min_samples
hyperparameters that determine how the clusters are formed.
Now...