Feature selection
Feature selection is one of the toughest parts of financial model building. Feature selection can be done statistically or by having domain knowledge. Here we are going to discuss only a few of the statistical feature selection methods in the financial space.
Removing irrelevant features
Data may contain highly correlated features and the model does better if we do not have highly correlated features in the model. The Caret R package gives the method for finding a correlation matrix between the features, which is shown by the following example.
A few lines of data used for correlation analysis and multiple regression analysis are displayed here by executing the following code:
>DataMR = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataForMultipleRegression.csv") >head(DataMR)
|
|
|
|
| |
1 |
80.13 |
72.86 |
93.1 |
63.7 |
83.1 |
2 |
79.57 |
72.88 |
90.2 |
63.5 |
82 |
3 |
79.93 |
71.72 |
99 |
64... |