Constructing the confidence interval for the population mean using the t-distribution
Let us review the process of statistical inference for the population mean. We start with a limited sample, from which we can derive the sample mean. Since we want to estimate the population mean, we would like to perform statistical inference based on the observed sample mean and quantify the range where the population statistic may exist.
For example, the average miles per gallon, shown in the following code, is around 20 in the mtcars
dataset:
>>> mean(mtcars$mpg) 20.09062
Given this result, we won’t be surprised to encounter another similar dataset with an average mpg
of 19 or 21. However, we would be surprised if the value is 5, 50, or even 100. When assessing a new collection of samples, we need a way to quantify the variability of the sample mean across multiple samples. We have learned two ways to do this: use the bootstrap approach to simulate artificial samples or...