Testing hypotheses for non-parametric data
Both t-tests and ANOVA have a major drawback: the population that is being sampled must follow a normal distribution. In many applications, this is not too restrictive because many real-world population values follow a normal distribution, or some rules, such as the central limit theorem, allow us to analyze some related data. However, it is simply not true that all possible population values follow a normal distribution in any reasonable way. For these (thankfully, rare) cases, we need some alternative test statistics to use as replacements for t-tests and ANOVA.
In this recipe, we will use a Wilcoxon rank-sum test and the Kruskal-Wallis test to test for differences between two (or more, in the latter case) populations.
Getting ready
For this recipe, we will need the pandas package imported as pd
, the SciPy stats
module, and a default random number generator instance created using the following commands:
from numpy.random import...