K-means with H2O
Here, we're comparing the K-means implementation of H2O with Scikit-learn. More specifically, we will run the mini-batch experiment using H2OKMeansEstimator
, the object for K-means available in H2O. The setup is similar to the one shown in the PCA with H2O section, and the experiment is the same as seen in the preceding section:
In:import h2o from h2o.estimators.kmeans import H2OKMeansEstimator h2o.init(max_mem_size_GB=4) def testH2O_kmeans(X, k): temp_file = tempfile.NamedTemporaryFile().name np.savetxt(temp_file, np.c_[X], delimiter=",") cls = H2OKMeansEstimator(k=k, standardize=True) blobdata = h2o.import_file(temp_file) tik = time.time() cls.train(x=range(blobdata.ncol), training_frame=blobdata) fit_time = time.time() - tik os.remove(temp_file) return fit_time piece_of_dataset = pd.read_csv(census_csv_file, iterator=True).get_chunk(500000).drop('caseid', axis=1).as_matrix() time_results = {4: [], 8:[], 12:[]} dataset_sizes ...