Using robust methods
Fortunately, there are some robust methods for analyzing datasets, which are generally less sensitive to extreme values. These robust statistical methods have been developed since 1960, but there are some well-known related methods from even earlier, like using the median instead of the mean as a central tendency. Robust methods are often used when the underlying distribution of our data is not considered to follow the Gaussian curve, so most good old regression models do not work (see more details in the Chapter 5, Buildings Models (authored by Renata Nemeth and Gergely Toth) and the Chapter 6, Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)).
Let's take the traditional linear regression example of predicting the sepal length of iris flowers based on the petal length with some missing data. For this, we will use the previously defined miris
dataset:
> summary(lm(Sepal.Length ~ Petal.Length, data = miris)) Call: lm(formula = Sepal.Length ...