In this chapter, we are going to discuss a practical application of unsupervised learning. Our goal is to train models that are either able to reproduce the probability density function of a specific data-generating process or to identify whether a given new sample is an inlier or an outlier. Generally speaking, we can say that the specific goal we want to pursue is finding anomalies, which are often samples that are very unlikely under the model (that is, given a probability distribution p(x) << λ where λ is a predefined threshold) or quite far from the centroid of the main distribution.
In particular, the chapter will comprise of the following topics:
- A brief introduction to probability density functions and their basic properties
- Histograms and their limitations
- Kernel density estimation (KDE)
- Bandwidth selection criteria
- Univariate example...