Detecting hidden structures in a dataset with clustering
A large part of unsupervised learning is devoted to the clustering problem. The goal is to group similar points together in a totally unsupervised way. Clustering is a hard problem, as the very definition of clusters (or groups) is not necessarily well posed. In most datasets, stating that two points should belong to the same cluster may be context-dependent or even subjective.
There are many clustering algorithms. We will see a few of them in this recipe, applied to a toy example.
How to do it...
Let's import the libraries:
In [1]: from itertools import permutations import numpy as np import sklearn import sklearn.decomposition as dec import sklearn.cluster as clu import sklearn.datasets as ds import sklearn.grid_search as gs import matplotlib.pyplot as plt %matplotlib inline
Let's generate a random dataset with three clusters:
In [2]: X, y = ds.make_blobs(n_samples=200, n_features...