What missing values are and how to deal with them
Data describing real-world phenomena often has a lot of missing data. Lack of data is a fact that cannot be overlooked, especially if the analyst wants to do an advanced study of the dataset to understand how much the variables in it are correlated.
The consequences of mishandling missing values can be many:
- The statistical power of variables with missing values is diminished, especially when a substantial number of values is missing for a single variable.
- The representativeness of the dataset subject to missing values may also be diminished, and thus the dataset in question may not correctly represent the substantive characteristics of the set of all observations of a phenomenon.
- Any statistical estimates may not converge to whole population values, thus generating bias.
- The results of the analysis conducted may not be correct.
But let's see what the causes could be that generate missing values in...