k-medoids
As we have described earlier, the k-means (medians) algorithm is best suited to particular distance metrics, the squared Euclidean and Manhattan distance (respectively), since these distance metrics are equivalent to the optimal value for the statistic (such as total squared distance or total distance) that these algorithms are attempting to minimize. In cases where we might have other distance metrics (such as correlations), we might also use the k-medoid method (Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern recognition. (2003).), which consists of the following steps:
- Select k initial points as the initial cluster centers.
- Calculate the nearest cluster center for each datapoint by any distance metric and assign it to that cluster.
- For each point and each cluster center, swap the cluster center with the point and calculate the reduction in overall distances to the cluster center across all cluster members using this swap. If it doesn't improve, undo it. Iterate...