In the previous section, we discussed an algorithm based on the assumption that the data-generating process can be represented as a weighted sum of multivariate Gaussian distributions. What happens when the covariance matrices are shrunk towards zero? As it's easy to imagine, when Σi → 0, the corresponding distribution degenerates to a Dirac's Delta centered on the mean. In other words, the probability will become almost 1 if the sample is extremely close to the mean, and 0 otherwise. In this case, the membership to a cluster becomes binary and it's determined only by the distance between the sample and the mean (the shortest distance will determine the winning cluster).
The K-means algorithm is the natural hard extension of Gaussian mixture and it's characterized by k (pre-determined) centroids or means (which justifies the name):
The...