Similar to the KNN from the previous chapter, we'll have a structure to represent an algorithm and keep all its hyperparameters:
struct KMeans { public let k: Int
The standard k-means algorithm was designed to be used only with Euclidean distance:
internal let distanceMetric = Euclidean.distance
We need several arrays to store different kinds of data during the clustering.
Storage for samples:
internal var data: [[Double]] = []
Coordinates of centroids:
public var centroids: [[Double]] = []
An array that matches each sample to its cluster. It should be of the same length as the data, and for every sample, it stores an index of centroid in the centroids array:
private(set) var clusters: [Int] = []
Within-cluster sum of squares is a measure that we'll use later to assess the quality of the result:
internal var WCSS: Double = 0.0
For...