Performing multivariate imputation by chained equations
Multivariate imputation methods, as opposed to univariate imputation, use multiple variables to estimate the missing values. In other words, the missing values of a variable are modeled based on the other variables in the dataset. Multivariate Imputation by Chained Equations (MICE) models each variable with missing values as a function of the remaining variables and uses that estimate for imputation.
The following steps are required to perform MICE:
- A simple univariate imputation is performed for every variable with missing data, for example, median imputation.
- One specific variable is selected, say,
var_1
, and the missing values are set back to missing. - A model is trained to predict
var_1
using the remaining variables as input features. - The missing values of
var_1
are replaced with the new estimates. - Steps 2 to 4 are repeated for each of the remaining variables.
Once all the variables have been...