In this section, we are going to analyze some common methods that can be employed to evaluate the performances of a clustering algorithm and also to help find the optimal number of clusters.
Evaluation metrics
Minimizing the inertia
One of the biggest drawbacks of K-means and similar algorithms is the explicit request for the number of clusters. Sometimes this piece of information is imposed by external constraints (for example, in the example of breast cancer, there are only two possible diagnoses), but in many cases (when an exploratory analysis is needed), the data scientist has to check different configurations and evaluate them. The simplest way to evaluate K-means performance and choose an appropriate number of clusters...