Introduction
So far, we have covered two popular ways of approaching the clustering problem: k-means and hierarchical clustering. Both clustering techniques have pros and cons associated with how they are carried out. Once again, let's revisit where we have been in the first two chapters so we can gain further context to where we will be going in this chapter.
In the challenge space of unsupervised learning, you will be presented with a collection of feature data, but no complementary labels telling you what these feature variables necessarily mean. While you may not get a discrete view into what the target labels are, you can get some semblance of structure out of the data by clustering similar groups together and seeing what is similar within groups. The first approach we covered to achieve this goal of clustering similar data points is k-means.
k-means works best for simpler data challenges where speed is paramount. By simply looking at the closest data points, there is not a lot of computational...