Using boxplots to identify outliers for continuous variables
Boxplots are essentially a graphical representation of our work in the Identifying outliers with one variable recipe in Chapter 4, Identifying Missing Values and Outliers in Subsets of Data. There, we used the concept of interquartile range (IQR)—the distance between the value at the first quartile and the value at the third quartile—to determine outliers. Any value greater than (1.5 * IQR
) + the third quartile value, or less than the first quartile value – (1.5 * IQR
), was considered an outlier. That is precisely what is revealed in a boxplot.
Getting ready
We will work with cumulative data on coronavirus cases and deaths by country, and the National Longitudinal Surveys (NLS) data. You will need the Matplotlib library to run the code on your computer.
How to do it…
We use boxplots to show the shape and spread of Scholastic Assessment Test (SAT) scores, weeks worked, and Covid cases...