The k-means Algorithm
The k-means algorithm is a flat clustering algorithm. It works as follows:
- Set the value of K.
- Choose K data points from the dataset that are initial centers of the individual clusters.
- Calculate the distance of each data point to the chosen center points, and group each point in the cluster whose initial center is the closest to the data point.
- Once all of the points are in one of the K clusters, calculate the center point of each cluster. This center point does not have to be an existing data point in the dataset; it is just an average.
- Repeat this process of assigning each data point into the cluster that has a center closest to the data point. Repetition continues until the center points no longer move.
To make sure that the k-means algorithm terminates, we need the following:
- A maximum level of tolerance when we exit in case the centroids move less than the tolerance value
- A maximum number of repetitions of shifting...