Training and validating the model
Model training gets the most attention when people talk about ML but it is usually the easiest step, once the data has been collected and prepared. A lot of time and energy can and should be spent on optimizing your models, via hyperparameter tuning. Whichever model you are interested in learning about and using, do some research on how to tune the model, and any additional steps required for data preparation.
With this simple network, the default Random Forest model was already optimal. I ran through several checks, and the default model did well enough. Here’s the code:
from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=1337, n_jobs=-1, n_estimators=100) clf.fit(X_train, y_train) train_acc = clf.score(X_train, y_train) test_acc = clf.score(X_test, y_test) print('train_acc: {} - test_acc: {}'.format(train_acc, test_acc))
We are using a Random Forest classifier, so we first need to...