Choosing the optimal number of clusters in Kmeans
One of the major drawbacks of the Kmeans clustering algorithm is the fact that the K number of clusters must be predefined by the user. One of the commonly used techniques to solve this problem is the elbow method. The elbow method uses the Within Cluster Sum of Squares (WCSS), also called inertia, to find the optimal number of clusters (K). WCSS indicates the total variance within clusters. It is calculated by finding the distance between each data point in a cluster and the corresponding cluster centroid and summing up these distances together.
The elbow method computes the Kmeans for a range of predefined K values – for example, 2–10 – and plots a graph, with the x axis being the number of K clusters and the y axis being the corresponding WCSS for each K cluster.
In this recipe, we will explore how to use the elbow method to identify the optimal number of K clusters. We will use some custom code alongside...