Handling Redundant Features
Redundant features are those that are highly correlated with each other. They will contain similar information with respect to their output variables. We can remove such features by finding correlation coefficients between features.
Exercise 30: Identifying Redundant Features
In this exercise, we will find redundant features, select any one among them, and remove them.
- Attach the caret package:
#Loading the library
library(caret)
- Load the GermanCredit dataset:
# load the German Credit Data
data(GermanCredit)
- Create a correlation matrix:
# calculating the correlation matrix
correlationMatrix <- cor(GermanCredit[,1:9])
- Print the correlation matrix:
# printing the correlation matrix
print(correlationMatrix)
The output is as follows:
Figure 3.12: The correlation matrix
- To find attributes that have high correlation, set the cutoff as 0.5.
# finding the attributes that are highly corrected
filterCorrelation <- findCorrelation(correlationMatrix, cutoff...