Missing data in categorical variables can be treated as a different category, so it is common to replace missing values with the Missing string. In this recipe, we will learn how to do so using pandas, scikit-learn, and Feature-engine.
Capturing missing values in a bespoke category
How to do it...
To proceed with the recipe, let's import the required tools and prepare the dataset:
- Import pandas and the required functions and classes from scikit-learn and Feature-engine:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from feature_engine.missing_data_imputers import CategoricalVariableImputer
- Let's load the dataset:
data = pd.read_csv('creditApprovalUCI...