Capping outliers using quantiles
When capping outliers, we clip the variable extreme values to a certain maximum or minimum value determined by some statistical parameter. A typical strategy involves setting outliers to a specified percentile. For example, we can set all data below the 5th percentile to the value at the 5th percentile and all data greater than the 95th percentile to the value at the 95th percentile. In this recipe, we will cap variables at arbitrary values determined by the percentiles using pandas
and Feature-engine
.
How to do it...
Let’s first import the Python libraries and load the data:
- Import the required Python libraries:
import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from feature_engine.outliers import Winsorizer
- Let’s load the Breast Cancer dataset from
scikit-learn
:breast_cancer = load_breast_cancer() X = pd.DataFrame( breast_cancer...