Dealing with outliers
The most widely used approaches to deal with outliers are as follows:
- Dropping them: The analyst concludes that eliminating the outliers altogether will guarantee better results in the final analysis.
- Capping them: It is common to use the strategy of assigning a fixed extreme value (cap or winsorize) to all those observations that exceed it (in absolute value) when it is certain that all extreme observations behave in the same way as those with the cap value.
- Assigning a new value: In this case, outliers are eliminated by replacing them with null values, and these null values are imputed using one of the simplest techniques: the replacement of null values with a fixed value that could be, for example, the mean or median of the variable in question. You’ll see more complex imputation strategies in the next sections.
- Transforming the data: When the analyst is dealing with natural outliers, very often the histogram of the variable...