Bringing outliers back within acceptable limits
Removing error outliers can be a valid strategy. However, this approach can reduce statistical power, in particular when there are outliers across many variables, because we end up removing big parts of the dataset. An alternative way to handle error outliers is by bringing outliers back within acceptable limits. In practice, what this means is replacing the value of the outliers with some thresholds identified with the IQR proximity rule, the mean and standard deviation, or MAD. In this recipe, we’ll replace outlier values using pandas
and feature-engine
.
How to do it...
We’ll use the mean and standard deviation to find outliers and then replace their values using pandas
and feature-engine
:
- Let’s import the required Python libraries and functions:
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from feature_engine.outliers import Winsorizer
- Load...