We will now implement the k-means clustering algorithm. It takes a CSV file as input with one data item per line. A data item is converted into a point. The algorithms classify these points into the specified number of clusters. In the end, the clusters are visualized on a graph using the matplotlib library:
# source_code/5/k-means_clustering.py
import math
import imp
import sys
import matplotlib.pyplot as plt
import matplotlib
import sys
sys.path.append('../common')
import common # noqa
matplotlib.style.use('ggplot')
# Returns k initial centroids for the given points.
def choose_init_centroids(points, k):
centroids = []
centroids.append(points[0])
while len(centroids) < k:
# Find the centroid that with the greatest possible distance
# to the closest already chosen centroid.
candidate...