Chapter 16: K-Means and DBSCAN Clustering
Data clustering allows us to organize unlabeled data into groups of observations with more in common with other members of the group than with observations outside of the group. There are a surprisingly large number of applications for clustering, either as the final model of a machine learning pipeline or as input for another model. This includes market research, image processing, and document classification. We sometimes also use clustering to improve exploratory data analysis or to create more meaningful visualizations.
K-means and density-based spatial clustering of applications with noise (DBSCAN) clustering, like principal component analysis (PCA), are unsupervised learning algorithms. There are no labels to use as the basis for predictions. The purpose of the algorithm is to identify instances that hang together based on their features. Instances that are in close proximity to each other, and further away from other instances, can...