Intermediate statistics - associations
In the previous chapter, you learned about the discrete statistics methods for getting the information about the distribution of discrete and continuous variables. In a data science project, the next typical step is to check for the associations between pairs of variables.
When checking for the associations between pairs of variables, you have three possibilities:
- Both variables are discrete
- Both variables are continuous
- One discrete and one continuous variable
Besides dealing with two variables only, this section also introduces linear regression, one of the most important statistical methods, where you model a single response (or dependent) variable with a regression formula that includes one or more predictor (or independent) variables.
Altogether, you will learn about the following in this section:
- Chi-squared test of independence of two discrete variables
- Phi coefficient, contingency coefficient, and Cramer's V coefficient that measures the association...