Identify clustering machine learning scenarios
Clustering is an unsupervised machine learning scenario where algorithms are employed to try to identify patterns in data. Unlike supervised learning, where training data has labels and features, unsupervised learning does not. The main goal of clustering is to be able to let the machine learning algorithms discover natural groupings within the data based on the similarities in the data points themselves.
Just as supervised learning had its algorithms, there are several popular algorithms available to use with clustering scenarios:
- K-means clustering: This algorithm partitions the data into K distinct, non-overlapping subsets (or clusters) based on the mean distance from the centroid of each cluster. The value of K needs to be specified beforehand.
- Hierarchical clustering: Builds a hierarchy of clusters either with a bottom-up approach (agglomerative) or a top-down approach (divisive). It does not require pre-specification...