Descriptive statistics
A descriptive statistic is a function that computes a numeric value which in some way summarizes the data in a numeric dataset.
We saw two statistics in Chapter 3, Data Visualization: the sample mean, , and the sample standard deviation, s. Their formulas are:
data:image/s3,"s3://crabby-images/0257f/0257ff6819d734a8e4db85d7594d905793d2888b" alt=""
data:image/s3,"s3://crabby-images/12650/12650fde44f018a5b4c91288eed5118729f5ec4a" alt=""
The mean summarizes the central tendency of the dataset. It is also called the simple average or mean average. The standard deviation is a measure of the dispersion of the dataset. Its square, s2, is called the sample variance.
The maximum of a dataset is its greatest value, the minimum is its least value, and the range is their difference.
If w = (w1, w2, …, wn) is a vector with the same number of components as the dataset, then we can use it to define the weighted mean:
data:image/s3,"s3://crabby-images/49bbf/49bbf8433174cf55637d31b752eef20526b2e91e" alt=""
In linear algebra, this expression is called the inner product of the two vectors, w and x = (x1, x2, …, xn). Note that if we choose all the weights to be 1/n, then the resulting weighted mean is just the sample mean.
The median of a dataset...