We have talked about decision trees, and now it’s time to discuss random forests. Very basically, a random forest is a collection of decision trees. In random forests, a fraction of the number of total rows and features are selected at random to train on. A decision tree is then built upon this subset. This collection will then have the results aggregated into a single result.
Random forests can also reduce bias and variance. How do they do this? By training on different data samples, or by using a random subset of features. Let’s take an example. Let’s say we have 30 features. A random forest might only use 10 of these features. That leaves 20 features unused, but some of those 20 features might be important. Remember that a random forest is a collection of decision trees. Therefore, in each tree, if we utilize 10 features, over time most if...