At this point, you may be feeling a bit overwhelmed with the information in this chapter. We have presented several ways of performing feature selection, some based on pure statistics and others based on the output of secondary machine learning models. It is natural to wonder how to decide which feature selection method is right for your data. In theory, if you are able to try multiple options, as we did in this chapter, that would be ideal, but we understand that it might not be feasible to do so. The following are some rules of thumbs that you can follow when you are trying to prioritize which feature selection module is more likely to offer greater results:
- If your features are mostly categorical, you should start by trying to implement a SelectKBest with a Chi2 ranker or a tree-based model selector.
- If your features are largely...