Gaussian mixture is one of the most well-known soft clustering approaches, with dozens of specific applications. It can be considered the father of k-means, because the way it works is very similar; but, contrary to that algorithm, given a sample xi ∈ X and k clusters (which are represented as Gaussian distributions), it provides a probability vector, [p(xi ∈ C1), ..., p(xi ∈ Ck)].
In a more general way, if the dataset, X, has been sampled from a data-generating process, pdata, a Gaussian mixture model is based on the following assumption:
In other words, the data-generating process is approximated by the weighted sum of multivariate Gaussian distributions. The probability density function of such a distribution is as follows:
The influence of each component of every multivariate Gaussian depends on the structure of the covariance matrix...