A mixture model is a probabilistic model of a sub-population within a population. These models are used to make statistical inferences about a sub-population, given the observations of pooled populations.
A Gaussian Mixture Model (GMM) is a mixture model represented as a weighted sum of Gaussian component densities. Its model coefficients are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a trained model.
The spark.ml implementation uses the EM algorithm.
It has the following parameters:
- k: Number of desired clusters
- convergenceTol: Maximum change in log-likelihood at which one considers convergence achieved
- maxIterations: Maximum number of iterations to perform without reaching convergence
- initialModel: Optional starting point from which to start the EM algorithm
(if this parameter is omitted, a random...