If I were ever stranded on a desert island and had to pick one algorithm to take with me, I'd definitely chose the gradient boosting ensemble! It has proven to work very well on many classification and regression problems. We are going to use it with the same automobile data from the previous sections. The classifier and the regressor versions of this ensemble share the exact same hyperparameters, except for the loss functions they use. This means that everything we are going to learn here will be useful to us whenever we decide to use gradient boosting ensembles for classification.
Unlike the averaging ensembles we have seen so far, the boosting ensembles build their estimators iteratively. The knowledge learned from the initial ensemble is used to build its successors. This is the main downside of boosting ensembles, where parallelism is unfeasible. Putting parallelism aside, this iterative nature of the ensemble...