Recall from Chapter 1, The Fundamentals of Machine Learning that the goal of unsupervised learning is to discover hidden structures or patterns in unlabeled training data. Clustering, or cluster analysis, is the task of grouping observations so that members of the same group, or cluster, are more similar to each other by some metric than they are to members of other clusters. As with supervised learning, we will represent an observation as an n-dimensional vector.
For example, assume that your training data consists of the samples plotted in the following figure:
Clustering might produce the following two groups, indicated by squares and circles:
Clustering can also produce the following four groups:
Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between...