Detecting point anomalies using IQR (Interquartile Range)
The basic algorithm to find anomalies or outliers is based on the quartile range. The basic idea behind this approach is that it believes that elements falling far off both sides of the normal distribution are anomalous. These far-off sides are determined by the boundaries of the box plot.
In descriptive statistics, the interquartile range (IQR), also called the midspread or middle fifty, is a measure of statistical dispersion equal to the difference between the upper and lower quartiles, IQR = Q3 − Q1. In other words, the IQR is the 1st quartile subtracted from the 3rd quartile; these quartiles can be clearly seen on a box plot in the data. It is a trimmed estimator, defined as the 25% trimmed range, and is the most significant, basic, and robust measure of scale.
The interquartile range is often used to find outliers in data. Outliers are observations that fall below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR). In a boxplot, the highest...