Detecting outliers using visualizations
There are two general approaches for using statistical techniques to detect outliers: parametric and non-parametric methods. Parametric methods assume you know the underlying distribution of the data. For example, if your data follows a normal distribution. On the other hand, in non-parametric methods, you make no such assumptions.
Using histograms and box plots are basic non-parametric techniques that can provide insight into the distribution of the data and the presence of outliers. More specifically, box plots, also known as box and whisker plots, provide a five-number summary: the minimum, first quartile (25th percentile), median (50th percentile), third quartile (75th percentile), and the maximum. There are different implementations for how far the whiskers extend, for example, the whiskers can extend to the minimum and maximum values. In most statistical software, including Python's matplotlib
and seaborn
libraries, the whiskers...