Descriptive statistics
A descriptive statistic is a function that computes a numeric value which in some way summarizes the data in a numeric dataset.
We saw two statistics in Chapter 3, Data Visualization: the sample mean, , and the sample standard deviation, s. Their formulas are:
data:image/s3,"s3://crabby-images/8878e/8878eaeadbb9c018dff2b40c53519cd07e2c0ca1" alt="Descriptive statistics"
data:image/s3,"s3://crabby-images/f73aa/f73aa568e8cf79f2fd2154f9924bbe84a347c05d" alt="Descriptive statistics"
The mean summarizes the central tendency of the dataset. It is also called the simple average or mean average. The standard deviation is a measure of the dispersion of the dataset. Its square, s2, is called the sample variance.
The maximum of a dataset is its greatest value, the minimum is its least value, and the range is their difference.
If w = (w1, w2, …, wn) is a vector with the same number of components as the dataset, then we can use it to define the weighted mean:
data:image/s3,"s3://crabby-images/ff89b/ff89bd82fda7c7f248f9606ee71b00d70580e01a" alt="Descriptive statistics"
In linear algebra, this expression is called the inner product of the two vectors, w and x = (x1, x2, …, xn). Note that if we choose all the weights to be 1/n, then the resulting weighted mean is just the sample mean.
The median...