Understanding the bootstrapping and bagging techniques
Bootstrapping is a pictorial word. It allows us to imagine someone pulling themselves up by their bootstraps. In other words, if no one is going to help us, then we need to help ourselves. In statistics, however, this is a sampling method. If there is not enough data, we help ourselves by creating more data.
Imagine that you have a small dataset and you want to build a classifier/estimator with this limited amount of data. In this case, you can perform cross-validation. Cross-validation techniques such as 10-fold cross-validation will decrease the number of records in each fold even further. We can take all the data as the training data, but you likely will end up with a model with very high variance. What should we do, then?
The bootstrapping method says that if the dataset being used is a sample of the unknown data in the dataset, why not try resampling again? The bootstrap method creates new training sets by uniformly...