Bootstrapping
The bootstrap method essentially refers to drawing multiple samples (each known as a resample) from the dataset consisting of randomly chosen data points, where there can be an overlap in the data points contained in each resample and each data point has an equal probability of being selected from the overall dataset:
From the previous diagram, we can see that each of the five bootstrapped samples taken from the primary dataset is different and has different characteristics. As such, training models on each of these resamples would result in different predictions.
The following are the advantages of bootstrapping:
- Each resample can contain different characteristics from that of the entire dataset, allowing us a different perspective of how the data behaves.
- Algorithms that make use of bootstrapping are powerfully built and handle unseen data better, especially on smaller datasets that have...