Modeling using random forests
Random forests, also known as random decision forests, are a machine learning algorithm that comes from the family of ensemble learning algorithms. It is used for both regression and classification tasks. Random forests are nothing but a collection or ensemble of decision trees, hence the name.
The working of the algorithm can be described briefly as follows. At any point in time, each tree in the ensemble of decision trees is built from a bootstrap sample, which is basically sampling with replacement. This sampling is done on the training dataset. During the construction of the decision tree, the split which was earlier being chosen as the best split among all the features is not done anymore. Now the best split is always chosen from a random subset of the features each time. This introduction of randomness into the model increases the bias of the model slightly but decreases the variance of the model greatly which prevents the overfitting of models, which is...