K-means Clustering
Like HCA, K-means also uses distance to assign observations into clusters not labeled in data. However, rather than linking observations to each other as in HCA, k-means assigns observations to k (user-defined number) clusters.
To determine the cluster to which each observation belongs, k cluster centers are randomly generated, and observations are assigned to the cluster in which its Euclidean distance is closest to the cluster center. Like the starting weights in artificial neural networks, cluster centers are initialized at random. After cluster centers have been randomly generated there are two phases:
- Assignment phase
- Updating phase
Note
The randomly generated cluster centers are important to remember, and we will be visiting it later in this chapter. Some refer to this random generation of cluster centers as a weakness of the algorithm, because results vary between fitting the same model on the same data, and it is not guaranteed to assign observations to the...