t-digest
t-digest is a data structure for estimating quantiles from a data stream or a large dataset using a compact sketch.
The t-digest data structure enables the resolution of various inquiries, such as “What proportion of values in the data stream is less than a specific value?” and “How many values in the data stream are below a given threshold?” To better understand t-digest, we need to define quantiles and percentiles.
A quantile is a value or cut point that divides a dataset into intervals with equal proportions or frequencies of observations. As an example, the median is an example of a quantile as it divides the dataset in half (that is, 50% of observations below and 50% above).
A percentile represents a specific position within a dataset, where a certain percentage of the data falls below that position. For example, if a value is at the 75th percentile of a dataset, it means that 75% of the data falls below that value. Percentiles are...