Analyzing the interquartile range (IQR) of a dataset
The IQR also measures the spread or variability of a dataset. It is simply the distance between the first and third quartiles. The IQR is a very useful statistic, especially when we need to identify where the middle 50% of values in a dataset lie. Unlike the range, which can be skewed by very high or low numbers (outliers), the IQR isn’t affected by outliers since it focuses on the middle 50. It is also useful when we need to compute for outliers in a dataset.
To analyze the IQR of a dataset, we will use the IQR
method from the stats
module within the scipy
library in Python.
Getting ready
We will work with the COVID-19 cases again for this recipe.
How to do it…
We will explore how to compute the IQR using the scipy
library:
- Import
pandas
and import thestats
module from thescipy
library:import pandas as pd from scipy import stats
- Load the
.csv
into a dataframe usingread_csv
. Then subset...