Replacing missing values
Removing missing values is a simple and quick approach to handling missing values. However, it is only effective when missing values are minimal. Replacing missing values is a better approach when there are many missing values within critical variables and when the values are missing at random. This approach is also called imputation.
We can fill in missing values using the following approaches:
- Statistical measures: This involves using summary statistics such as mean, median, percentiles, and so on.
- Backfill or forward fill: In sequential data, we can use the last value before the missing value or the next value after the missing value. This is known as backfill and forward fill, respectively. This method is more appropriate when dealing with time series data where the missing values are likely to be time dependent.
- Model-based: This involves using machine learning models such as linear regression or K-nearest neighbors (KNN). This is...