Out-of-core CART with H2O
Up until now, we have only dealt with desktop solutions for CART models. In Chapter 4, Neural Networks and Deep Learning, we introduced H2O for deep learning out of memory that provided a powerful scalable method. Luckily, H2O also provides tree ensemble methods utilizing its powerful parallel Hadoop ecosystem. As we covered GBM and random forest extensively in previous sections, let's get to it right away. For this exercise, we will use the spam dataset that we used before.
Random forest and gridsearch on H2O
Let's implement a random forest with gridsearch hyperparameter optimization. In this section, we first load the spam dataset from the URL source:
import pandas as pd import numpy as np import os import xlrd import urllib import h2o #set your path here os.chdir('/yourpath/') url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data' filename='spamdata.data' urllib.urlretrieve(url, filename)
Now that we have loaded the data, we can...