Using grouped boxplots to view bivariate relationships between continuous and categorical features
Grouped boxplots are an underappreciated visualization. They are helpful when we're examining the relationship between continuous and categorical features since they show how the distribution of a continuous feature can vary by the values of the categorical feature.
We can explore this by returning to the National Longitudinal Survey (NLS) data we worked with in the previous chapter. The NLS has one observation per survey respondent but collects annual data on education and employment (data for each year is captured in different columns).
Data Note
As stated in Chapter 1, Examining the Distribution of Features and Targets, the NLS of Youth is conducted by the United States Bureau of Labor Statistics. Separate files for SPSS, Stata, and SAS can be downloaded from the respective repository. The NLS data can be downloaded from https://www.nlsinfo.org/investigator/pages/search...