Identifying outliers
There are different methods for detecting outliers, depending on whether you are analyzing one variable at a time (univariate analysis) or several variables at once (multivariate analysis). In the univariate case, the analysis is fairly straightforward. The multivariate case, however, is more complex. Let’s examine it in detail.
Univariate outliers
One of the most direct and common ways to identify outliers for a single variable is to make use of boxplots, which you learned about in Chapter 15, Adding Statistical Insights: Associations. Some of the key points of a boxplot are the interquartile range (IQR), defined as the distance from the first quartile (Q1) to the third quartile (Q3), the lower whisker (Q1 - 1.5 x IQR), and the upper whisker (Q3 + 1.5 x IQR):
Figure 16.2: Boxplot’s main characteristics
Specifically, all observations that are before the lower whisker and after the upper whisker are identified as outliers. This...