Analyzing correlation among pairs of variables
This recipe walks you through the process of using correlation to analyze a multivariate time series. This task is useful to understand the relationship among the different variables in the series and thereby understand its dynamics.
Getting ready
A common way to analyze the dynamics of multiple variables is by computing the correlation of each pair. You can use this information to perform feature selection. For example, when pairs of variables are highly correlated, you may want to keep only one of them.
How to do it…
First, we compute the correlation among each pair of variables:
corr_matrix = data_daily.corr(method='pearson')
We can visualize the results using a heatmap from the seaborn
library:
import seaborn as sns import matplotlib.pyplot as plt sns.heatmap(data=corr_matrix, cmap=sns.diverging_palette(230, 20, as_cmap=True), xticklabels=data_daily.columns, yticklabels=data_daily.columns, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5}) plt.xticks(rotation=30)
Heatmaps are a common way of visualizing matrices. We pick a diverging color set from sns.diverging_palette
to distinguish between negative correlation (blue) and positive correlation (red).
How it works…
The following figure shows the heatmap with the correlation results:
Figure 1.7: Correlation matrix for a multivariate time series
The corr
()
method computes the correlation among each pair of variables in the data_daily
object. In this case, we use the Pearson correlation with the method='pearson'
argument. Kendall and Spearman are two common alternatives to the Pearson correlation.