What outliers are and how to deal with them
Generally, outliers are defined as those observations that lie at an abnormal distance from other observations in a data sample. In other words, they are uncommon values in a dataset. The abnormal distance we're talking about obviously doesn't have a fixed measurement but is strictly dependent on the dataset you're analyzing. Simply put, it will be the analyst who decides the distance beyond which to consider others abnormal distances based on their experience and functional knowledge of the business reality represented by the dataset.
Important Note
It makes sense to talk about outliers for numeric variables or for numeric variables grouped by elements of categorical variables. It makes no sense to talk about outliers for categorical variables only.
But why is there so much focus on managing outliers? The answer is that very often they cause undesirable macroscopic effects on some statistical operations. The most...