Implementing DBSCAN clustering
DBSCAN is a very flexible approach to clustering. We just need to specify a value for ɛ, also referred to as eps. As we have discussed, the ɛ value determines the size of the ɛ-neighborhood around an instance. The minimum samples hyperparameter indicates how many instances around an instance are needed for it to be considered a core instance.
Note
We use DBSCAN to cluster the same income gap data that we worked with in the previous section.
Let’s build a DBSCAN clustering model:
- We start by loading familiar libraries, plus the
DBSCAN
module:import pandas as pd from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import make_pipeline from sklearn.cluster import DBSCAN from sklearn.impute import KNNImputer from sklearn.metrics import silhouette_score import matplotlib.pyplot as plt import os import sys sys.path.append(os.getcwd() + "/helperfunctions")
- We import the code to load and...