Correlation between non-numeric and numeric variables
If you want to graphically represent an association between a numeric variable and a categorical (non-numeric) variable, the boxplot or violin plot will be the graphical representation for you. If you have already come across the problem of having to represent the distribution of a variable by highlighting key statistics, then you should be familiar with a boxplot:
data:image/s3,"s3://crabby-images/5e15a/5e15a8c9178f410affb9122ce05d6dd5b30cf668" alt="A diagram of a number of different colored squares Description automatically generated"
Figure 15.31: Graphical explanation of a boxplot
A violin plot is nothing more than a combination of a histogram/distribution plot and a boxplot for the same variable:
data:image/s3,"s3://crabby-images/71d56/71d564c73d545b6c5db0921c3da6d137eb261fad" alt="Violin plots explained. Learn how to use violin plots and what… | by ..."
Figure 15.32: Graphical explanation of a violin plot
See the References section for more details about boxplots and violin plots.
If you need to relate a numeric variable to a categorical variable, you can create a violin plot for each element of the categorical variable. Returning to the example of the Titanic disaster dataset, given the Pclass
(categorical) and Age
(numeric...