Spotting univariate outliers
Univariate outliers are very large or small values that occur in a single variable in our dataset. These values are considered to be extreme and are usually different from the rest of the values in the variable. It is important to identify them and deal with them before any further analysis or modeling is done.
There are two major methods for identifying univariate outliers:
- Statistical measures: We can employ statistical methods such as the interquartile range (IQR), Z-score, and measure of skewness.
- Data visualization: We can also employ various visual options to spot outliers. Histograms, boxplots, and violin plots are very useful charts that display the distribution of our dataset. The shape of the distribution can point to where the outliers lie.
We will explore how to spot univariate outliers using the histplot
and boxplot
methods in seaborn
.
Getting ready
We will work with the Amsterdam House Prices data for this recipe...