Unsupervised machine learning with clustering
Julia's package ecosystem provides a dedicated library for clustering. Unsurprisingly, it's called Clustering.We can simply executepkg> add Clustering
to install it. TheClustering
package implements a few common clustering algorithms—k-means,affinity propagation,DBSCAN,andkmedoids.
The k-means algorithm
The k-means algorithm is one of the most popular ones, providing a balanced combination of good results and good performance in a wide range of applications. However, one complication is that we're required to give it the number of clusters beforehand. More exactly, this number, called k (hence the first letter of the name of the algorithm), represents the number of centroids. A centroid is a point that is representative of each cluster.
The k-means algorithm applies an iterative approach—it places the centroids using the algorithm defined by the seeding procedure, then it assigns each point to its corresponding centroid, the mean to which is closest...