Bagging and random forests
Bagging is an ensemble method where multiple models are trained on subsets of the training data. The models’ predictions are combined to make a final prediction, usually by taking the average for numerical prediction (for regression) or the majority vote for a class (for classification). When training each model, we select a subset of data from the original training dataset with replacement—that is, a specific training pattern can be a member of multiple subsets. Since each model is only presented with a sample of the training data, no single model can “memorize” the training data, which reduces overfitting. The following diagram illustrates the bagging process:
Figure 2.1 – Illustration of the bagging process; each independent classifier is trained on a random subsample from the training data and a final prediction is made by aggregating the predictions of all classifiers
Each model in a bagging...