Significance testing proportions
Let's return to the question of whether the measured differences in male or female fatality rates could be due to chance alone. As in Chapter 2, Inference, our z-test is simply the difference in proportions divided by the pooled standard error:
In the preceding formula, p1 denotes the proportion of women who survived, that is, 339/466 = 0.73. And p2 denotes the proportion of men who survived, that is, 161/843 = 0.19.
To calculate the z-statistic, we need to pool our standard errors for the two proportions. Our proportions measure the survival rates of males and females respectively, so the pooled standard error is simply the standard error of the males and females combined, or the total survival rate overall, as follows:
Substituting the values into the equation for the z-statistic:
Using a z-score means we'll use the normal distribution to look up the p-value:
(defn ex-4-11 [] (let [dataset (load-data "titanic.tsv") proportions (fatalities-by-sex...