Datasets often contain a mix of numerical and categorical variables. In addition, some variables may contain a few missing data points, while others will contain quite a big proportion. The mechanisms by which data is missing may also vary among variables. Thus, we may wish to perform different imputation procedures for different variables. In this recipe, we will learn how to perform different imputation procedures for different feature subsets using scikit-learn.
Assembling an imputation pipeline with scikit-learn
How to do it...
To proceed with the recipe, let's import the required libraries and classes and prepare the dataset:
- Let's import pandas and the required classes from scikit-learn:
import...