Estimating missing data with nearest neighbors
In imputation with K-Nearest Neighbors (KNN), missing values are replaced with the mean value from their k closest neighbors. The neighbors of each observation are found utilizing distances like the Euclidean distance, and the replacement value can be estimated as the mean or weighted mean of the neighbor’s value, where further neighbors have less influence on the replacement value. In this recipe, we will perform KNN imputation using scikit-learn.
How to do it...
To proceed with the recipe, let’s import the required libraries and prepare the data:
- Let’s import the required libraries, classes, and functions:
import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.impute import KNNImputer
- Let’s load the dataset that we prepared in the Technical requirements section only with some numerical variables:
variables = ["A2", "...