Removing redundant variables using correlation matrices
In this recipe we will remove redundant variables by building a correlation matrix that identifies highly correlated variables.
Getting ready
This recipe uses the datafile, nasadata.txt
and the stream file, recipe_variableselection_correlations.str
.
You will need a copy of Microsoft Excel to visualize the correlation matrix.
How to do it...
To remove redundant variables using correlation matrices:
- Open the stream,
recipe_variableselection_correlations.str
by navigating to File | Open Stream. - Make sure the datafile points to the correct path to the file
nasadata.txt
. - Open the Type node named
Correlation Types
. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target can be any variable that won't be an input to the model. If you don't have a good candidate, you can create a random variable and...