Grubb's test for multivariate data using Mahalanobis distance
Grubb's test can be used for multivariate data by transforming multivariate data to univariate data using the following transformation:
Where is the covariance matrix of .
The following code finds these y-squared values from a given :
The following are the functions to calculate the covariance matrix:
The following is the input given:
This produces the following output:
ys = [([2.0; 2.0], -48066176.91); ([2.0; 5.0], -48066176.91); ([6.0; 5.0], -2584692.113); ([100.0; 345.0], -2.097348892e+12)]
Now, Grubb's test for univariate data can be applied on top of these generated values:
[-48066176.91; -48066176.91; -2584692.113; -2.097348892e+12]
The z scores of these values are:
[0.5773335755; 0.5773335755; 0.5773836562; 1.732050807]
As you can see, the z-score corresponding to the last entry is considerably bigger than the z-score of the rest. This means the last element in the multivariate dataset (which is [100;345]) is anomalous.
Imagine...