Handling missing data with univariate imputation using scikit-learn
Scikit-Learn
is a very popular machine learning library in Python. The scikit-learn
library offers a plethora of options for everyday machine learning tasks and algorithms such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
Additionally, the library offers multiple options for univariate and multivariate data imputation.
Getting ready
You can download the Jupyter notebooks and requisite datasets from the GitHub repository. Please refer to the Technical requirements section of this chapter.
This recipe will utilize the three functions prepared earlier (read_dataset
, rmse_score
, and plot_dfs
).
You will be using four datasets from the Ch7
folder: clicks_original.csv
, clicks_missing.csv
, co2_original.csv
, and co2_missing_only.csv
. The datasets are available from the GitHub repository.
How to do it…
You will start by importing the libraries and then read all...