Different numerical columns may have different scales. One column's age is in the tens, while its salary is typically in the thousands. As we saw earlier, putting different columns into a similar scale helps in some cases. Here are some of the cases where scaling is recommended:
- It allows gradient-descent solvers to converge quicker.
- It is needed for algorithms such as KNN and Principle Component Analysis (PCA)
- When training an estimator, it puts the features on a comparable scale, which helps when juxtaposing their learned coefficients.
In the next sections, we are going to examine the most commonly used scalers.
The standard scaler
This converts the features into normal distribution by setting their mean to 0 and their standard deviation to 1. This is done using the following operation, where a column's mean value is subtracted from each value in it, and then...