Advanced boosting algorithm – DART
DART is an extension of the standard GBDT algorithm discussed in the previous section [4]. DART employs dropouts, a technique from deep learning (DL), to avoid overfitting by the decision tree ensemble. The extension is straightforward and consists of two parts. First, when fitting the next prediction tree, M n+1(x), which consists of the scaled sum of all previous trees M n…M 1, a random subset of the previous trees is instead used, with other trees dropped from the sum. The p drop parameter controls the probability of a previous tree being included. The second part of the DART algorithm is to apply additional scaling of the contribution of the new tree. Let k be the number of trees dropped when the new tree, M n+1, was calculated. Since M n+1 was calculated without the contribution of those k trees when updating our prediction, F n+1, which includes all trees, the prediction overshoots. Therefore...