Using the median absolute deviation to find outliers
The mean and the standard deviation are heavily impacted by outliers. Hence, using these parameters to identify outliers can defeat the purpose. A better way to identify outliers is by using MAD. MAD is the median of the absolute deviation between each observation and the median value of the variable:
In the previous equation, xi
is each observation in the X
variable. The beauty of MAD is that it uses the median instead of the mean, which is robust to outliers. The b
constant is used to estimate the standard deviation from MAD, and if we assume normality, then b =
1.4826
.
Note
If the variable is assumed to have a different distribution, b
is then calculated as 1 divided by the 75th percentile. In the case of normality, 1/75th percentile = 1.4826.
After computing MAD, we use the median and MAD to establish distribution limits, designating values beyond these limits as outliers. The limits are set as the median plus...