To demonstrate the random forest classifier, we are going to use a synthetic dataset. We first create the dataset using the built-in make_hastie_10_2 class:
from sklearn.datasets import make_hastie_10_2
x, y = make_hastie_10_2(n_samples=6000, random_state=42)
This previous code snippet creates a random dataset. I set random_state to a fixed number to make sure we both get the same random data. Now, we can split the resulting data into training and test sets:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)
Then, to evaluate the classifier, we are going to introduce a new concept called the Receiver Operating Characteristic (ROC) curve in the next section.
The ROC curve
"Probability is expectation founded upon partial knowledge. A perfect acquaintance with all the circumstances affecting the occurrence...