Sampling distributions
In Chapter 7, What Are the Chances? An Introduction to Statistics, we mentioned how much we love it when data follows the normal distribution. One of the reasons for this is that many statistical tests (including the ones we will use in this chapter) rely on data that follows a normal pattern, and for the most part, a lot of real-world data is not normal (surprised?). Take our employee break data, for example—you might think I was just being fancy creating data using the Poisson distribution, but I had a reason for this. I specifically wanted non-normal data, as shown:
pd.DataFrame(breaks).hist(bins=50,range=(5,100))
Figure 8.4 – The histogram of our break-takers with a larger number of bins, showing more granularity
As you can see, our data is definitely not following a normal distribution; it appears to be bimodal, which means that there are two peaks of break times, at around 25 and 70 minutes. As our data is...