Estimating the correlation between two variables with a contingency table and a chi-squared test
Whereas univariate methods deal with single-variable observations, multivariate methods consider observations with several features. Multivariate datasets allow the study of relations between variables, more particularly their correlation, or lack thereof (that is, independence).
In this recipe, we will take a look at the same tennis dataset as in the first recipe of this chapter. Following a frequentist approach, we will estimate the correlation between the number of aces and the proportion of points won by a tennis player.
How to do it...
Let's import NumPy, pandas, SciPy.stats, and Matplotlib:
>>> import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt %matplotlib inline
We download and load the dataset:
>>> player = 'Roger Federer' df = pd.read_csv('https://github.com/ipython-books/' 'cookbook-2nd...