Introduction
In the previous chapter, we introduced the concept of clustering, and practiced it using k-means clustering. However, several issues remained unresolved, such as how to choose the number of clusters and how to evaluate a clustering technique once the clusters are created. This chapter aims to expand on the content of the previous one and fill in some of those gaps.
There are a number of different methods for approaching the problem of choosing the number of clusters when using k-means clustering, some relying on judgment and some using more technical quantitative measures. You can even use clustering techniques that don’t require you to explicitly state the number of clusters; however, these methods have their own tradeoffs and hyperparameters that need to be tuned. We’ll study these in this chapter.
We also have only dealt with data that is fairly easy for k-means to deal with: continuous variables or binary variables. In this chapter, we’ll explain how to deal with data containing...