DBSCAN is a powerful algorithm that can easily solve non-convex problems where K-means fails. The main idea is quite simple: a cluster is a high-density area (there are no restrictions on its shape) surrounded by a low-density one. This statement is generally true and doesn't need an initial declaration about the number of expected clusters. The procedure is mainly based on a metric function (normally the Euclidean distance) and a radius, ε. Given a sample xi, its boundary is checked for other samples. If it is surrounded by at least nmin points, it becomes a core point:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/391806c0-0cbe-4fb7-99e7-c079db78bb79.png)
A sample xj is defined as directly reachable from a core point xi if:
![](https://static.packt-cdn.com/products/9781789347999/graphics/assets/187051a9-b8e8-4956-b0d4-961bb773c550.png)
An analogous concept holds for sequences of directly reachable points. Hence, if there's a sequence xi → xi+1 → ... → xj, then xi and xj are said to be reachable. Moreover, given a sample xk, if...