Scaling features to a range
When working with machine learning models, it is important to preprocess data so certain problems such as an explosion of gradients or lack of proper distribution representation can be solved.
To transform raw feature vectors into a representation that is better suited for the downstream estimators, the sklearn.preprocessing
package offers a number of common utility functions and transformer classes.
Many machine learning estimators used in scikit-learn
frequently require dataset standardization; if the individual features do not more or less resemble standard normally distributed data, they may behave poorly: Gaussian with a mean of 0 and a variation of 1.
In general, standardizing the dataset is advantageous for learning algorithms. Robust scalers or transformers are preferable if there are any outliers in the collection. On a dataset with marginal outliers, the actions of several scalers, transformers, and normalizers are highlighted in the analysis...