Using k-nearest neighbors for imputation
k-Nearest Neighbors (KNN) is a popular machine learning technique because it is intuitive and easy to run and yields good results when there is not a large number of variables and observations. For the same reasons, it is often used to impute missing values. As its name suggests, KNN identifies the k observations whose variables are most similar to each observation. When used to impute missing values, KNN uses the nearest neighbors to determine what fill values to use.
Getting ready
We will work with the KNN imputer from scikit-learn version 1.3.0. If you do not already have scikit-learn, you can install it with pip install scikit-learn
.
How to do it...
We can use KNN imputation to do the same imputation we did in the previous recipe on regression imputation.
- We start by importing the
KNNImputer
fromscikit-learn
and loading the NLS data again:import pandas as pd import numpy as np from sklearn.impute import...