Clustering Performance Evaluation
Unlike supervised learning, where we always have the labels to evaluate our predictions with, unsupervised learning is a bit more complex as we do not usually have labels. In order to evaluate a clustering model, two approaches can be taken depending on whether the label data is available or not:
- The first approach is the extrinsic method, which requires the existence of label data. This means that in absence of label data, human intervention is required in order to label the data or at least a subset of it.
- The other approach is the intrinsic approach. In general, the extrinsic approach tries to assign a score to clustering, given the label data, whereas the intrinsic approach evaluates clustering by examining how well the clusters are separated and how compact they are.
Note
We will skip the mathematical explanations as they are quite complicated.
You can find more mathematical details on the sklearn website at this URL: https://scikit...