Proper variance estimation with missing values
Very often in practice, missing values are a major problem. Standard routines for estimation are typically not designed to deal with missing values. In the following we discuss a method to adequately deal with missing values when estimating the variance/uncertainty of an estimator.
Because of non-answered questions or measurement errors, data often has the following data structure:
Here we see n observations and p variables and some missing values (NA).
Often one will omit those observations that include missing values from the data set. However, this decreases the sample size and thus increases the variance of estimators, and in addition this may cause biased estimates if missing values are missing at random, that is; if the probability of missingness depends on covariates.
To work around this problem, another, better solution is to impute missing values. For some applications the imputations are done in a way to minimize a prediction error. For...