DBSCAN is a powerful algorithm that can easily solve non-convex problems where K-means fails. The main idea is quite simple: a cluster is a high-density area (there are no restrictions on its shape) surrounded by a low-density one. This statement is generally true and doesn't need an initial declaration about the number of expected clusters. The procedure is mainly based on a metric function (normally the Euclidean distance) and a radius, ε. Given a sample xi, its boundary is checked for other samples. If it is surrounded by at least nmin points, it becomes a core point:
A sample xj is defined as directly reachable from a core point xi if:
An analogous concept holds for sequences of directly reachable points. Hence, if there's a sequence xi → xi+1 → ... → xj, then xi and xj are said to be reachable. Moreover, given a sample xk, if...