Detecting outliers using a modified z-score
In the Detecting outliers using a z-score recipe, you experienced how simple and intuitive the method is. But it has one major drawback: it assumes your data is normally distributed.
But, what if your data is not normally distributed? Luckily, there is a modified version of the z-score to work with non-normal data. The main difference between the regular z-score and the modified z-score is that we replace the mean with the median:
Where (tilde x) is the median of the dataset, and MAD is the median absolute deviation of the dataset:The 0.6745
value is the standard deviation unit that corresponds to the 75th percentile (Q3) in a Gaussian distribution and is used as a normalization factor. In other words, it is used to approximate the standard deviation. This way, the units you obtain from this method are measured in standard deviation, similar to how you would interpret the regular z-score.
You can obtain this value using SciPy&apos...