The K-Means Algorithm
The k-means algorithm is a flat clustering algorithm, as mentioned previously. It works as follows:
- Set the value of k.
- Choose k data points from the dataset that are the initial centers of the individual clusters.
- Calculate the distance from each data point to the chosen center points and group each point in the cluster whose initial center is the closest to the data point.
- Once all the points are in one of the k clusters, calculate the center point of each cluster. This center point does not have to be an existing data point in the dataset; it is simply an average.
- Repeat this process of assigning each data point to the cluster whose center is closest to the data point. Repetition continues until the center points no longer move.
To ensure that the k-means algorithm terminates, we need the following:
- A maximum threshold value at which the algorithm will then terminate
- A maximum number of repetitions of shifting...