Understanding DBSCAN and CBDBSCAN
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It is a popular clustering algorithm based on the density of data points to identify clusters. Checkback DBSCAN (CBDBSCAN) is an extension of DBSCAN and is employed in Ensemble LDA. Let’s first learn about DBSCAN, then why the extension is needed in CBDBSCAN.
DBSCAN
DBSCAN, an unsupervised machine learning algorithm, is often used for clustering data points based on their density and proximity to each other. Before learning about the algorithm, let’s get familiar with a few terms in DBSCAN. The first is epsilon. It is a parameter that controls the maximum distance between data points in a cluster. The value of epsilon is set before running the algorithm. It should be small enough to capture the density of the clusters but not so small that it creates too many clusters. The second term is minPts. It is the minimum number of neighbors required for a point...