What outliers are
Generally, outliers are defined as observations that are at an unusual distance from other observations in a data sample. In other words, they are uncommon values in a dataset. The abnormal distance we’re talking about doesn’t have a fixed measure, of course, but is strictly dependent on the dataset you’re analyzing. Simply put, it will be the analyst who decides the distance beyond which others will be considered abnormal distances, based on their experience and functional knowledge of the business reality represented by the dataset.
IMPORTANT NOTE
It makes sense to talk about outliers for numeric variables, or for numeric variables grouped by elements of categorical variables. It does not make sense to talk about outliers only for categorical variables.
But why is there so much focus on outlier management? The answer is that they very often have undesirable macroscopic effects on some statistical operations. The most...