Marking imputed values
A missing indicator is a binary variable that takes the value 1
or True
to indicate whether a value was missing, or 0
or False
otherwise. It is common practice to replace missing observations with the mean, median, or most frequent category while simultaneously marking those missing observations with missing indicators. In this recipe, we will learn how to add missing indicators using pandas
, scikit-learn, and feature-engine
.
How to do it...
Let’s begin by making some imports and loading the data:
- Let’s import the required libraries, functions, and classes:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from feature_engine.imputation import( AddMissingIndicator, CategoricalImputer, MeanMedianImputer ...