What Is Preprocessing?
Anyone who's ever worked in a company on a machine learning project knows that real-world data is messy. It's often aggregated from multiple sources or using multiple platforms or recording devices, and it's incomplete and inconsistent. In preprocessing, we want to improve the data quality to successfully apply a machine learning model.
Data preprocessing includes the following set of techniques:
- Feature transforms
- Scaling
- Power/log transforms
- Imputation
- Feature engineering
These techniques fall largely into two classes: either they tailor to the assumptions of the machine learning algorithm (feature transforms) or they are concerned with constructing more complex features from multiple underlying features (feature engineering). We'll only deal with univariate feature transforms, transforms that apply to one feature at a time. We won't discuss multivariate feature...