Chi-squared statistic to determine anomalies
Ye and Chen used a statistic to determine anomalies in the operating system call data. The training phase assumes that the normal data has a multivariate normal distribution. The value of the statistic is determined as:
Where denotes the observed value of the ith variable, is the expected value of the ith variable (obtained from the training data), and n is the number of variables. A large value of denotes that the observed sample contains anomalies.
The following function calculates the respective values for all the elements in a collection:
When this function is called with the same data [1.;100.;2.;4.5;2.55;70.]
as the observed data and [111.;100.;2.;4.5;2.55;710.]
as the expected values then the following result is obtained:
[(1.0, 12100.0); (100.0, 0.0); (2.0, 0.0); (4.5, 0.0); (2.55, 0.0); (70.0, 5851.428571)]
As you can see, the value of is very high (121000.0 and 5851.428571) in the first and last observations. This means that the...