Applying winsorization
Winsorizing, or winsorization, consists of replacing extreme, poorly known observations, that is, outliers, with the magnitude of the next largest (or smallest) observation. It’s similar to the procedure described in the previous recipe, Bringing outliers back within acceptable limits, but not exactly the same. Winsorization involves replacing the same number of outliers at both ends of the distribution, which makes Winsorization a symmetric process. This guarantees that the Winsorized mean, that is, the mean estimated after replacing outliers, remains a robust estimator of the central tendency of the variable.
In practice, to remove a similar number of observations at both tails, we’d use percentiles. For example, the 5th percentile is the value below which 5% of the observations lie and the 95th percentile is the value beyond which 5% of the observations lie. Using these values as replacements might result in replacing a similar number of observations...