Clusters as Neighborhoods
Until now, we have explored the concept of likeness being described as a function of Euclidean distance – data points that are closer to any one point can be seen as similar, while those that are further away in Euclidean space can be seen as dissimilar. This notion is seen once again in the DBSCAN algorithm. As alluded to by the lengthy name, the DBSCAN approach expands upon basic distance metric evaluation by also incorporating the notion of density. If there are clumps of data points that all exist in the same area as one another, they can be seen as members of the same cluster:
In the preceding figure, we can see four neighborhoods. The density-based approach has a number of benefits when compared to the past approaches we've covered that focus exclusively on distance. If you were just focusing on distance as a clustering threshold, then you may find your...