In classical statistical inference, we often answer questions about a population, which is a hypothetical group of all possible values and data (including future ones). A sample, on the other hand, is a subset of the population that we use to observe values. In classical statistical inference, we often seek to answer questions about a fixed, non-random, unknown population parameter.
Confidence intervals are computed from data, and are expected to contain θ. We may refer to, say, a 95% confidence interval—that is, an interval that we are 95% confident contains θ, in the sense that there is a 95% chance that when we compute such an interval, we capture θ in it.
This section focuses on binary variables, where the variable is either a success or a failure, and successes occur with a proportion or probability of p.
An example situation of this is tracking whether a visitor to a website clicked on an ad during their visit. Often, these variables are encoded numerically, with 1 for success, and 0 for a failure.
In classical statistics, we assume that our data is a random sample drawn from a population with a fixed, yet unknown, proportion, p. We can construct a confidence interval based on the sample proportion, which gives us an idea of the proportion of the population. A 95% confidence interval captures the proportion of the population approximately 95% of the time. We can construct confidence intervals using the proportion_confint() function, which is found in the statsmodel package, which allows the easy computation of confidence intervals. Let's now see this in action!