Test your knowledge
We learned a lot of feature selection and engineering techniques here. To practice some of these, use the same loans dataset with the LOAN_DEFAULT
column as the target variable, and perform the following:
- Plot the ANOVA F-scores and p-values for the numeric features.
- Evaluate the
STATE_ID
feature and decide on the number of top state IDs to keep. Put all other state IDs in another
column, and then one-hot encode this feature. Finally, join it together with your original loan DataFrame. - Extract the day of week from the
DISBURSAL_DATE
feature and check the phi-k correlation. Comment on the strength of the correlation and if this new feature should be used.
As always, write some analysis explaining and interpreting the results.
Using another loan default dataset (the "default of credit card clients.xls
" file in the GitHub repository for the book under Chapter10/test_your_knowledge/data
), perform PCA and examine the explained...