Partitioning using k-means clustering
The goal of partitioning is to place partitions and create clusters that reduce the within cluster sum of square error. In an extreme case, you could achieve a zero sum of square error if every data point existed in its own cluster. This would not be very useful though, would it? So partitioning is about finding the balance between reducing error and finding the right number of clusters.
A commonly used partitioning method is k-means. You will more often see it referred to as k-means clustering. K-means clustering places centers at k locations in the observation space to serve as the means of these k clusters. For example, if you were performing k-means clustering with k = 3, you would place three cluster means somewhere in the data space to set the initial conditions of the analysis.
K-means iteratively steps through the following three primary steps:
- Specify the number of clusters, k. Assign their initial locations randomly or in specific locations.
- The...