Regularization with Data
Even though there are plenty of regularization methods for models (with each model having a unique set of hyperparameters), sometimes, the most effective regularization comes from the data itself. Indeed, sometimes, even the most powerful model can’t have good performance if the data is not transformed properly beforehand.
In this chapter, we’ll look at some methods that help regularize models from data:
- Hashing high cardinality features
- Aggregating features
- Undersampling an imbalanced dataset
- Oversampling an imbalanced dataset
- Resampling imbalanced data with SMOTE