Outlier detection
Outliers are very important to be taken into consideration for any analysis as they can make analysis biased. There are various ways to detect outliers in R and the most common one will be discussed in this section.
Boxplot
Let us construct a boxplot
for the variable volume of the Sampledata
, which can be done by executing the following code:
> boxplot(Sampledata$Volume, main="Volume", boxwex=0.1)
The graph is as follows:
Figure 2.16: Boxplot for outlier detection
An outlier is an observation which is distant from the rest of the data. When reviewing the preceding boxplot, we can clearly see the outliers which are located outside the fences (whiskers) of the boxplot.
LOF algorithm
The local outlier factor (LOF) is used for identifying density-based local outliers. In LOF, the local density of a point is compared with that of its neighbors. If the point is in a sparser region than its neighbors then it is treated as an outlier. Let us consider some of the variables from...