Chapter 6. Clustering
Things that have a common quality ever quickly seek their kind. | ||
--Marcus Aurelius |
In previous chapters, we covered multiple learning algorithms: linear and logistic regression, C4.5, naive Bayes, and random forests. In each case we were required to train the algorithm by providing features and a desired output. In linear regression, for example, the desired output was the weight of an Olympic swimmer, whereas for the other algorithms we provided a class: whether the passenger survived or perished. These are examples of supervised learning algorithms: we tell our algorithm the desired output and it will attempt to learn a model that reproduces it.
There is another class of learning algorithm referred to as unsupervised learning. Unsupervised algorithms are able to operate on the data without a set of reference answers. We may not even know ourselves what structure lies within the data; the algorithm will attempt to determine the structure for...