Covariance and correlation
Suppose X and Y are random variables with the joint density function f(x, y). Then the following statistics are defined:
The statistic σXY is called the covariance of the bivariate distribution, and ρXY is called the correlation coefficient.
It can be proved mathematically that if X and Y are independent, then σXY = 0. On the other hand, if X and Y are completely dependent (for example, if one is a function of the other), then σXY = σX σY. These are the two extremes for covariance. Consequently, the correlation coefficient is a measure of the interdependence of X and Y, and is bounded as:
The program in Listing 4-3 helps explain how the correlation coefficient works. It defines three two-dimensional arrays at lines 14-16. Each array has two rows, one for the X values and one for the Y values.
The first array, data1
, is generated by the random()
method, which is defined at lines 24-31. It contains 1,000 (x, y) pairs, the X-values in data1[0]
, and the Y-values in data1...