Flooring and capping outliers
Quantile-based flooring and capping are two related outlier handling techniques. They involve replacing extreme values with fixed values, in this case, quantiles.
Flooring involves replacing small extreme values with a predetermined minimum value, such as the value of the 10th percentile. On the other hand, capping involves replacing large extreme values with a predetermined maximum value, such as the value of the 90th percentile.
These techniques are more appropriate when extreme values are likely caused by measurement errors or data entry errors. In cases where the outliers are genuine, these techniques will likely introduce bias.
We will explore how to handle outliers using the flooring and capping approach. We will use the quantile
method in pandas
to achieve this.
Getting ready
We will work with the Amsterdam House Prices data for this recipe. You can retrieve all the files from the GitHub repository.
How to do it…
We will...