In this chapter, we started with an introduction to a typical machine learning problem, online advertising click-through prediction and the challenges including categorical features. We then resorted to tree-based algorithms that can take in both numerical and categorical features. We then had an in-depth discussion on the decision tree algorithm: the mechanics, different types, how to construct a tree, and two metrics, Gini impurity and entropy, to measure the effectiveness of a split at a tree node. After constructing a tree in an example by hand, we implemented the algorithm from scratch. We also learned how to use the decision tree package from scikit-learn and applied it to predict click-through. We continued to improve the performance by adopting the feature-based bagging algorithm random forest. The chapter then ended with tips to tune a random forest model.
More practice is always good for honing...