K-means
When we discussed the Gaussian mixture algorithm, we defined it as Soft K-means. The reason is that each cluster was represented by three elements: mean, variance, and weight. Each sample always belongs to all clusters with a probability provided by the Gaussian distributions. This approach can be very useful when it's possible to manage the probabilities as weights, but in many other situations, it's preferable to determine a single cluster per sample. Such an approach is called hard clustering and K-means can be considered the hard version of a Gaussian mixture. In fact, when all variances Σi → 0, the distributions degenerate to Dirac's Deltas, which represent perfect spikes centered at a specific point. In this scenario, the only possibility to determine the most appropriate cluster is to find the shortest distance between a sample point and all the centers (from now on, we are going to call them centroids). This approach is also based on an important double principle that should...