An overview of clustering
Clustering is a division of data into groups of similar objects. Each object (cluster) consists of objects that are similar between themselves and dissimilar to objects of other groups. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Clustering can be used in varied areas of application from data mining (DNA analysis, marketing studies, insurance studies, and so on.), text mining, information retrieval, statistical computational linguists, and corpus-based computational lexicography. Some of the requirements that must be fulfilled by clustering algorithms are as follows:
- Scalability
- Dealing with various types of attributes
- Discovering clusters of arbitrary shapes
- The ability to deal with noise and outliers
- Interpretability and usability
The following diagram shows a representation of clustering: