Summary
The EM algorithm is a computation approach to find a solution for maximum likelihood estimators. Basically, the EM algorithm consists of two steps, the E-step for estimation of parameters and the M-step for maximization according to the actual parameters. The algorithm usually converges quickly and is applied in many areas.
In this chapter we saw the application in two areas, in clustering and in imputation of missing values. Clustering is an NP-hard problem; loosely speaking, we cannot find the exact closed-form solution in a reasonable time. The EM algorithm is therefore necessary to interactively find a good solution. In clustering, the EM algorithm is implemented for the k-means clustering algorithm, but also (not shown in this chapter) for model-based clustering and for mixture models in general.
Missing values occur frequently in data sets in practice. Data scientists are probably those people whose main job is in data pre-processing, thus they also have to impute missing values...