Scaling with the median and quantiles
When scaling variables to the median and quantiles, the median value is removed from the observations, and the result is divided by the Inter-Quartile Range (IQR). The IQR is the difference between the 3rd quartile and the 1st quartile, or, in other words, the difference between the 75th percentile and the 25th percentile:
This method is known as robust scaling because it produces more robust estimates for the center and value range of the variable. Robust scaling is a suitable alternative to standardization when models require the variables to be centered and the data contains outliers. It is worth noting that robust scaling will not change the overall shape of the variable distribution.
How to do it...
In this recipe, we will implement scaling with the median and IQR by utilizing scikit-learn:
- Let’s start by importing
pandas
and the required scikit-learn classes and functions:import pandas as pd from sklearn.datasets...