Summary
It’s important to not only be able to perform clustering, but also use several different types of clustering algorithms and evaluate the performance of each using multiple methods, so that the correct tool can be used for the job. In this chapter, we learned various methods for choosing the number of clusters, including judgment-based methods such as visual inspection of cluster overlap and elbow determination using the sum of squared errors, and objective methods such as evaluating the silhouette score. Each of these methods has strengths and weaknesses—the more abstract and quantified the measure is, the further removed we are from understanding why a particular clustering seems to be failing or succeeding. However, as we have seen, making judgments is often difficult, especially with complex data, and this is where quantifiable methods, in particular the silhouette score, tend to shine. In practice, sometimes one measure will not give a clear answer while another does; this is...