Detecting and removing missing values
Missing values are values that should have been recorded but, for some reason, weren't actually recorded. Those values are different, from values without meaning, represented in R with NaN (not a number).
Most of us understood missing values due to circumstances such as the following one:
> x <- c(1,2,3,NA,4) > mean(x) [1] NA
"Oh come on, I know you can do it. Just ignore that useless NA" was probably your reaction, or at least it was mine.
Fortunately, R comes packed with good functions for missing value detection and handling.
In this recipe and the following one, we will see two opposite approaches to missing value handling:
- Removing missing values
- Simulating missing values by interpolation
I have to warn you that removing missing values can be considered right in a really small number of cases, since it compromises the integrity of your data sources and can greatly reduce the reliability of your results.
Nevertheless, if you are...