Finding outliers with the interquartile range proximity rule
If the variables are not normally distributed variables, we can identify outliers utilizing the IQR proximity rule. According to the IQR rule, data points that fall below the 25th quantile - 1.5 times the IQR, or beyond the 75th quantile + 1.5 times the IQR, are outliers.
Note
We described the IQR in the Visualizing outliers with boxplots recipe.
In this recipe, we will identify outliers utilizing the IQR proximity rule.
How to do it...
Let’s begin the recipe by importing the Python libraries and loading the dataset:
- Import the required Python libraries:
import numpy as np import pandas as pd from sklearn.datasets import fetch_california_housing
- Let’s load the California housing dataset from
scikit-learn
:X, y = fetch_california_housing( return_X_y=True, as_frame=True)
- Let’s create a function that returns the 25th quantile - 1.5 times the IQR, or the...