Summary
To conclude the chapter, let's remind ourselves of the main features of the second unsupervised learning feature in the Elastic Stack: outlier detection. Outlier detection can be used to detect unusual data points in single or multidimensional datasets.
The algorithm is based on an ensemble of four separate measures: two distance-based measures based on kth-nearest neighbors and two density-based measures. The combination of these measures captures how far a given data point is from its neighbors and from the general mass of data in the dataset. This unusualness is captured in a numerical outlier score that ranges from 0 to 1. The closer a given data point scores to 1, the more unusual it is in the dataset.
In addition to the outlier score, for each feature or field of a point, we compute a quantity known as the feature influence. The higher the feature influence for a given field, the more that field is responsible for a given point being unusual. These feature...