Handling missing data with univariate imputation using pandas
Generally, there are two approaches to imputing missing data: univariate imputation
and multivariate imputation
. This recipe will explore univariate imputation techniques available in pandas.
In univariate imputation, you use non-missing values in a single variable (think a column or feature) to impute the missing values for that variable. For example, if you have a sales column in the dataset with some missing values, you can use a univariate imputation method to impute missing sales observations using average sales. Here, a single column (sales
) was used to calculate the mean (from non-missing values) for imputation.
Some basic univariate imputation techniques include the following:
- Imputing using the mean.
- Imputing using the last observation forward (forward fill). This can be referred to as Last Observation Carried Forward (LOCF).
- Imputing using the next observation backward (backward fill). This can be referred to as...