Clustering with affinity propagation
Clustering aims to partition data into groups called clusters. Clustering is usually unsupervised in the sense that no examples are given. Some clustering algorithms require a guess for the number of clusters, while other algorithms don't. Affinity propagation falls in the latter category. Each item in a dataset can be mapped into Euclidean space using feature values. Affinity propagation depends on a matrix containing Euclidean distances between data points. Since the matrix can quickly become quite large, we should be careful not to take up too much memory. The scikit-learn library has utilities to generate structured data. Create three data blobs as follows:
x, _ = datasets.make_blobs(n_samples=100, centers=3, n_features=2, random_state=10)
Call the euclidean_distances()
function to create the aforementioned matrix:
S = euclidean_distances(x)
Cluster using the matrix in order to label the data with the corresponding cluster:
aff_pro = cluster...