The EM algorithm by example of k-means clustering
Probably the most famous algorithm for clustering observations to groups is the k-means algorithm. We will see that this algorithm is just a variant of the EM algorithm.
Given n objects, characterized by p variables, we like to partition them into clusters such that cluster has members and each observation is in one cluster. The mean vector (center, prototype), Vk, of a cluster is defined as the centroid of the cluster and the components of the mean vector can be calculated by where is the number of observations in cluster and is the i-th observation belonging to cluster . For each cluster the corresponding cluster means are calculated.
We also need to determine the number of clusters in the output partition. Starting from the given initial locations of the cluster centroids, the algorithm uses the data points to iteratively relocate the centroids and reallocate points to the closest centroid. The process is composed of these steps...