Comparing two continuous columns
Evaluating how two continuous columns relate to one another is the essence of regression. But it goes beyond that. If you have two columns with a high correlation to one another, often, you may drop one of them as a redundant column. In this section, we will look at EDA for pairs of continuous columns.
How to do it…
- Look at the covariance of the two numbers if they are on the same scale:
>>> fueleco.city08.cov(fueleco.highway08) 46.33326023673625 >>> fueleco.city08.cov(fueleco.comb08) 47.41994667819079 >>> fueleco.city08.cov(fueleco.cylinders) -5.931560263764761
- Look at the Pearson correlation between the two numbers:
>>> fueleco.city08.corr(fueleco.highway08) 0.932494506228495 >>> fueleco.city08.corr(fueleco.cylinders) -0.701654842382788
- Visualize the correlations in a heatmap:
>>> import seaborn as sns >>> fig,...