Summary
We have explored the most popular approaches for missing value imputation in this chapter, and have discussed the advantages and disadvantages of each approach. Assigning an overall sample mean is not usually a good approach, particularly when observations with missing values are different from other observations in important ways. We also can substantially reduce our variance. Forward or backward filling allows us to maintain the variance in our data, but works best when the proximity of observations is meaningful, such as with time series or longitudinal data. In most non-trivial cases we will want to use a multivariate technique, such as regression, KNN, or random forest imputation. We examined all these approaches in this chapter, and for the next chapter, we will learn about encoding, transforming, and scaling features.
Leave a review!
Enjoying this book? Help readers like you by leaving an Amazon review. Scan the QR code below to get a free eBook of your choice...