7.6 Mixture models and clustering
Clustering or cluster analysis is the data analysis task of grouping objects in such a way that objects in a given group are closer to each other than to those in the other groups. The groups are called clusters and the degree of closeness can be computed in many different ways, for example, by using metrics, such as the Euclidean distance. If instead we take the probabilistic route, then a mixture model arises as a natural candidate to solve clustering tasks.
Performing clustering using probabilistic models is usually known as model-based clustering. Using a probabilistic model allows us to compute the probability of each data point belonging to each one of the clusters. This is known as soft clustering instead of hard clustering, where each data point belongs to a cluster with a probability of 0 or 1. We can turn soft clustering into hard clustering by introducing some rule or boundary. In fact, you may remember that this is exactly what we do to...