Computing the coefficient of a correlation
In the Using the built-in statistics library and Average of values in a counter recipes in this chapter, we looked at ways to summarize data. These recipes showed how to compute a central value, as well as variance and extrema.
Another common statistical summary involves the degree of correlation between two sets of data. One commonly used metric for correlation is called Pearson's r. The r-value is the number between -1
and +1
that expresses the degree to which the data values correlate with one another.
A value of zero says the data is random. A value of 0.95 suggests that 95% of the values correlate, and 5% don't correlate well. A value of -0.95 says that 95% of the values have an inverse correlation: when one variable increases, the other decreases.
This is not directly supported by Python's standard library. We can follow a design pattern similar to those shown in the Average of values in a counter recipe...