We have seen how feature selection works with L1-regularized logistic regression in a previous section, where 574 out of 2820 more important ad click features were chosen. This is because with L1 regularization, less important weights are compressed to close to or exactly 0. Besides L1-regularized logistic regression, random forest is another frequently used feature selection technique.
To recap, random forest is bagging over a set of individual decision trees. Each tree considers a random subset of the features when searching for the best splitting point at each node. And as an essence of the decision tree algorithm, only those significant features (along with their splitting values) are used to constitute tree nodes. Consider the whole forest, the more frequently a feature is used in a tree node, the more important it is. In other words, we can rank the importance of features...