Detecting anomalies using density estimation
In general, normal elements are more common than anomalous entries in any system. So, if the probability of the occurrence of elements in a collection is modeled by the Gaussian or normal distribution, then we can conclude that the elements for which the estimated probability density is more than a predefined threshold are normal, and those for which the value is less than a predefined threshold are probably anomalies.
Let's say that is a random variable of rows. The following couple of formulae find the average and standard deviations for feature , or, in other words, for all the elements of in the jth column if is represented as a matrix.
Given a new entry x, the following formula calculates the probability density estimation:
If is less than a predefined threshold, then the entry is tagged to be anomalous, else it is tagged as normal.
The following code finds the average value of the jth feature:
Here is a sample run of the px
method:
>...