We need to define a statistical method that quantitatively measures the degree of association between two features. The covariance of two variables does exactly that, so let's see how it is calculated. If there are two variables, x and y, we first center their values around their mean values, and ; then, we multiply the new values and take the mean of the product:
This definition implies that if both variables increase or decrease at the same time, then the covariance is positive, whereas if they move in opposite directions, then the covariance is negative. If there is no correlation, the covariance value will be small, that is, close to zero.
It is also clear from the definition that, since the variables keep their scale, it is difficult to compare features that have very different mean values and it is impossible to compare two covariances....