So how does mice come up with the imputed values?
Let's focus on the univariate case, where only one column contains missing data and we use all the other (completed) columns to impute the missing values before generalizing to a multivariate case.
mice
actually has a few different imputation methods up its sleeve, each best suited for a particular use case. mice
will often choose sensible defaults based on the data type (continuous, binary, non-binary categorical, and so on).
The most important method is what the package calls the norm
method. This method is very much like stochastic regression. Each of the m imputations is created by adding a normal noise term to the output of a linear regression predicting the missing variable. What makes this slightly different than just stochastic regression repeated m times is that the norm
method also integrates uncertainty about the regression coefficients used in the predictive linear model.
Recall that the regression coefficients in a linear regression...