An overview of the techniques
We will now get to an overview of the techniques, covering the regression and classification trees, random forests, and gradient boosting. This will set the stage for the practical business cases.
Regression trees
To establish an understanding of tree-based methods, it is probably easier to start with a quantitative outcome and then move on to how it works in a classification problem. The essence of a tree is that the features are partitioned, starting with the first split that improves the RSS the most. These binary splits continue until the termination of the tree. Each subsequent split/partition is not done on the entire dataset but only on the portion of the prior split that it falls under. This top-down process is referred to as recursive partitioning. It is also a process that is greedy, a term you may stumble upon in reading about the machine learning methods. Greedy means that during each split in the process, the algorithm looks for the greatest reduction...