Chapter 3. Data Preparation – Clean
In this chapter, we will cover:
- Binning scale variables to address missing data
- Using a full data model/partial data model approach to address missing data
- Imputing in-stream mean or median
- Imputing missing values randomly from uniform or normal distributions
- Using random imputation to match a variable's distribution
- Searching for similar records using a Neural Network for inexact matching
- Using neuro-fuzzy searching to find similar names
- Producing longer Soundex codes