A brief primer on tree-based methods
No chapter on structured data would be complete without mentioning tree-based methods, such as random forests or XGBoost.
It is worth knowing about them because, in the realm of predictive modeling for structured data, tree-based methods are very successful. However, they do not perform as well on more advanced tasks, such as image recognition or sequence-to-sequence modeling. This is the reason why the rest of the book does not deal with tree-based methods.
Note
Note: For a deeper dive into XGBoost, check out the tutorials on the XGBoost documentation page: http://xgboost.readthedocs.io. There is a nice explanation of how tree-based methods and gradient boosting work in theory and practice under the Tutorials section of the website.
A simple decision tree
The basic idea behind tree-based methods is the decision tree. A decision tree splits up data to create the maximum difference in outcomes.
Let's assume for a second that our isNight
feature is the greatest...