Understanding clustering
Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groupings of similar items. It does this without having been told what the groups should look like ahead of time. As we may not even know what we're looking for, clustering is used for knowledge discovery rather than prediction. It provides an insight into the natural groupings found within data.
Without advance knowledge of what comprises a cluster, how could a computer possibly know where one group ends and another begins? The answer is simple. Clustering is guided by the principle that records inside a cluster should be very similar to each other, but very different from those outside. As you will see later, the definition of similarity might vary across applications, but the basic idea is always the same: group the data such that related elements are placed together.
The resulting clusters can then be used for action. For instance, you might find clustering...