Locating regions of high density via DBSCAN
Although we can't cover the vast amount of different clustering algorithms in this chapter, let's at least include one more approach to clustering: density-based spatial clustering of applications with noise (DBSCAN), which does not make assumptions about spherical clusters like k-means, nor does it partition the dataset into hierarchies that require a manual cut-off point. As its name implies, density-based clustering assigns cluster labels based on dense regions of points. In DBSCAN, the notion of density is defined as the number of points within a specified radius, .
According to the DBSCAN algorithm, a special label is assigned to each example (data point) using the following criteria:
- A point is considered a core point if at least a specified number (MinPts) of neighboring points fall within the specified radius, .
- A border point is a point that has fewer neighbors than MinPts within ε, but lies within...