Implementing missing value imputation algorithms
So far, we have often used Python and R indiscriminately to implement solutions to the problems addressed in this book. But when it comes to missing value analysis, we will focus on R over Python. There’s a compelling reason for this choice. R has traditionally been used by statisticians and data miners for statistical software development and data analysis, and it has an extensive collection of packages designed specifically for statistical analysis. Some of these packages, designed specifically for missing value analysis, are truly unrivaled when compared to Python’s ecosystem. In other words, R comes bundled with powerful, statistically specialized tools that are not only more sophisticated than their Python counterparts but also very easy to use.
So, suppose you need to compute the Pearson correlation coefficient between the two numeric variables Age
and Fare
of the Titanic disaster dataset. Let’s first...