We talked a little bit about assessing clusters when the ground truth is not known. However, we have not yet talked about assessing k-means when the cluster is known. In a lot of cases, this isn't knowable; however, if there is outside annotation, we will know the ground truth or at least the proxy sometimes.
Assessing cluster correctness
Getting ready
So, let's assume a world where we have an outside agent supplying us with the ground truth.
We'll create a simple dataset, evaluate the measures of correctness against the ground truth in several ways, and then discuss them:
from sklearn import datasets...
from sklearn import cluster
blobs, ground_truth = datasets.make_blobs(1000, centers=3,cluster_std=1.75)