Using the Pearson correlation coefficient
The Pearson correlation coefficient is a number that ranges between -1.0 and 1.0, signifying the linear relationship of two numerical series. A value of 1.0 means strong linear correlation, a -1.0 is a strong negative correlation, and a 0.0 means the series is uncorrelated.
A brilliantly informative diagram was created by Kiatdd on http://en.wikipedia.org/wiki/File:Correlation_coefficient.gif, which is shown in the following figure:
For example, Nick is quite a generous movie critic who consistently awards movies with high ratings. His friend John might be a more dramatic critic who offers a wider range of ratings, yet the two friends tend to always agree on which movies they prefer.
We can use the Pearson correlation coefficient to detect that there is a strong linear correspondence between how these two rate movies.
Getting ready
Install the hstats
library using cabal as follows:
$ cabal install hstats
Create a file with five star rating values on each...