Clustering problems
In addition to anomaly detection, there is another class of problem that takes an unsupervised approach to trying to group entities together in order to understand more about the dataset. Clustering is the process of finding elements of a dataset that contain enough similar attributes that you can determine clear distinctions from among the individual points.
There are many applications of this technique, and we'll go over the following few examples now:
- Grouping segments of a customer base
- Knowing which emails are promotions and which are more important
To achieve this, we can use a few different algorithms such as the following:
- DBScan
- K-Means clustering
While there are many more, you can be sure that these have shown promising results across various datasets and are a great place to start.
Let's look at DBscan first.
DBScan
Density-Based Spatial Clustering of Applications with Noise (or DBScan for...