When we discussed the k-means algorithm, we saw that we had to give the number of clusters as one of the input parameters. In the real world, we won't have this information available. We can definitely sweep the parameter space to find out the optimal number of clusters using the silhouette coefficient score, but this will be an expensive process! A method that returns the number of clusters in our data will be an excellent solution to the problem. DBSCAN does just that for us.
Estimating the number of clusters using the DBSCAN algorithm
Getting ready
In this recipe, we will perform a DBSCAN analysis using the sklearn.cluster.DBSCAN function. We will use the same data that we used in the previous Evaluating the performance...