Estimating the correlation between two variables with a contingency table and a chi-squared test
Whereas univariate methods deal with single-variable observations, multivariate methods consider observations with several features. Multivariate datasets allow the study of relations between variables, more particularly their correlation or lack thereof (that is, independence).
In this recipe, we will take a look at the same tennis dataset as in the first recipe of this chapter. Following a frequentist approach, we will estimate the correlation between the number of aces and the proportion of points won by a tennis player.
Getting ready
Download the Tennis dataset on the book's GitHub repository at https://github.com/ipython-books/cookbook-data, and extract it in the current directory.
How to do it...
Let's import NumPy, pandas, SciPy.stats, and matplotlib:
In [1]: import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt %matplotlib...