Handling Outliers
Any datapoint with a value that is very different from the other data points is an outlier. Outliers can affect the training process negatively and therefore they need to be handled gracefully. In the following section, we will illustrate via examples both the process of detecting an outlier and the techniques used to handle them.
Exercise 16: Identifying Outlier Values
The outlier package can detect the outlier values. Using the opposite=TRUE parameter will fetch the outliers from the other side of dataset. The outlier values can be verified using a boxplot.
- Attach the outlier package:
library(outliers)
- Detect outliers:
#Detect outliers
outlier(PimaIndiansDiabetes[,1:4])
The output is as follows:
pregnant glucose pressure triceps
17 0 0 99
Detect outliers from the other end:
#This...