Clustering is used to select smaller subsets of data with members sharing similar characteristics from a larger dataset. As an example, consider a marketing scenario. You have a large customer base to which you plan to send advertising material; however, cost prohibits you from sending material to every customer. Performing clustering on the dataset will return groupings of customers with similar characteristics. You can then survey the results and choose a target group.
Major methods for clustering include hierarchical and K-means. Hierarchical clustering is more thorough and thus more time-consuming. It generates a series of models that range from 1, which includes all data points, to n, where each data point is an individual model. K-means clustering is a quicker method in which the user or another function defines the number of clusters. For example...