Locating regions of high density via DBSCAN
Although we can't cover the vast number of different clustering algorithms in this chapter, let's at least introduce one more approach to clustering: Density-based Spatial Clustering of Applications with Noise (DBSCAN). The notion of density in DBSCAN is defined as the number of points within a specified radius .
In DBSCAN, a special label is assigned to each sample (point) using the following criteria:
- A point is considered as core point if at least a specified number (MinPts) of neighboring points fall within the specified radius
- A border point is a point that has fewer neighbors than MinPts within , but lies within the radius of a core point
- All other points that are neither core nor border points are considered as noise points
After labeling the points as core, border, or noise points, the DBSCAN algorithm can be summarized in two simple steps:
- Form a separate cluster for each core point or a connected group of core points (core...