Identifying multivariate outliers
Multivariate outliers are extreme values that occur in three or more variables simultaneously. In simple terms, the observation differs from other observations when the variables are considered collectively and not necessarily individually.
We can identify outliers using techniques such as the following:
- Mahalanobis distance: It measures the distance of each observation from the center of the data distribution (centroid). In simple terms, Mahalanobis distance helps to quantify how far away each observation is from the mean vector (set of mean values of each variable in a dataset which acts as a centroid or central location of the variables in the dataset).It provides a measure of distance that is adjusted for the specific characteristics of the dataset such as the correlations between variables and the variability of each variable, enabling a more accurate assessment of similarity between observations. This allows for the identification...