Finding outliers in data
Outliers are the values that, compared to others, are particularly extreme (a value clearly distant from the other available observations.). Outliers are a problem because they tend to distort data analysis results, in particular in descriptive statistics and correlations. These should be identified in the data cleaning phase, but can also be dealt in the next step of data analysis. Outliers can be univariate when they have an extreme value for a single variable, or multivariate when they have an unusual combination of values on a number of variables.
Outliers are the extreme values of a distribution that are characterized by being extremely high or extremely low compared to the rest of the distribution, and thus representing isolated cases with respect to the rest of the distribution.
There are different methods to detect the outliers, we will use the Tukey's method which uses the interquartile range (IQR) range approach. This method is not dependent on distribution...