Introducing DBSCAN
The basic idea behind the density-based spatial clustering of applications with noise (DBSCAN) algorithm is that clusters are regions of high point density, separated from other clusters by low point density regions. The algorithm takes each point in the dataset to identify the high-density regions and checks whether its neighborhood contains a minimum number of points. Unlike K-means, DBSCAN does not require manually specifying the number of clusters; it is more immune to outliers and more appropriate when the clusters have complex shapes.
To employ the algorithm, we need to set two hyperparameters:
- epsilon is the radius of the circle to be created around each point to check the region’s density
- minPts determines the minimum number of data points within the circle to label its center as a core point
All the data points with less than minPts but more than one point in their neighborhood are called border points. Finally, data points without...