Diagnosing missing values in R and Python
Before thinking about imputing missing values in a dataset, we first need to know the extent to which the missing values affect each individual variable.
The code used in this section can be found in the R\03-diagnose-missing-values-in-r.R
and Python\03-diagnose-missing-values-in-python.py
files in Chapter 16
. To properly run this code and the code in the following sections, you must install the required R and Python packages as follows:
- Open Anaconda Prompt.
- Enter the
conda activate pbi_powerquery_env
command. - Enter the
pip install missingno==0.5.2
command. - Enter the
pip install upsetplot==0.8.0
command. - Then, open RStudio and make sure it is referencing your latest CRAN R (version 4.4.2 in our case).
- Click on the Console window and enter
install.packages(c("naniar", "imputeTS", "forecast", "ggpubr", "missForest", "mice", "miceadds...