Let's suppose that we have a dataset made up of n m-dimensional points drawn from a data generating process, pdata:
In many cases, it's possible to assume that the blobs (that is, the densest and most separated regions) are symmetric around a mean (in general, the symmetry is different for each axis), so that they can be represented as multivariate Gaussian distributions. Under this assumption, we can imagine that the probability of each sample is obtained as a weighted sum of k (the number of clusters) multivariate Gaussians parametrized by the mean vector, μj and the covariance matrix, Σi:
This model is called Gaussian mixture and can be employed either as a soft- or a hard-clustering algorithm. The former option is clearly the native way because each point is associated with a probability vector representing the membership degree with...