K-means is the simplest implementation of the principle of maximum separation and maximum internal cohesion. Let's suppose we have a dataset X ∈ ℜM×N (that is, M N-dimensional samples) that we want to split into K clusters and a set of K centroids corresponding to the means of the samples assigned to each cluster Kj:
![](https://static.packt-cdn.com/products/9781789348279/graphics/assets/02e9a6e4-5321-44f8-b545-b0e6cd0bf18f.png)
The set M and the centroids have an additional index (as a superscript) indicating the iterative step. Starting from an initial guess M(0), K-means tries to minimize an objective function called inertia (that is, the total average intra-cluster distance between samples assigned to a cluster Kj and its centroid μj):
![](https://static.packt-cdn.com/products/9781789348279/graphics/assets/56a960f6-4a96-4e97-883e-17277d02904c.png)
It's easy to understand that S(t) cannot be considered as an absolute measure because its value is highly influenced by the variance of the samples. However, S(t+1) < S(t) implies that the centroids are moving...