Scaling
Scaling consists of bringing numerical features in a dataset into the same range of values. For example, in a dataset, you could expect to have an age range between 30 and 75 years and salaries between 30,000 USD and 120,000 USD. Because the scale of both features is very different, this can hurt the model's performance.
Although scaling is not mandatory for many algorithms, some based on distance calculations, such as k-NN or k-means, need to have scaled continuous features to perform well.
To help you with this task, Optimus gives you three scaling methods:
- Normalization
- Standardization
- Max abs scaler
To show you how they work, let's start by creating a simple dataframe:
df = op.create.dataframe({"A":[1.12,3.2,4.35,6.3,7.3,np.nan]})
Now, let's learn how to apply normalization.
Normalization
Normalization (also called min-max normalization) scales all the values in a fixed range between 0 and 1....