Summary
In this chapter, we conducted a wide survey of H2O capabilities for model building at scale. We learned about the data sources we can ingest into our H2O clusters and the file formats that are supported. We learned how this data moves from the source to the H2O cluster, and how the H2OFrame API provides a single handle in the IDE to represent the distributed in-memory data on the H2O cluster as a single two-dimensional data structure. We then learned the many ways in which we can manipulate data through the H2OFrame API and how to export it to external systems if need be.
We then surveyed the core of H2O model building at scale – H2O's many state-of-the-art distributed unsupervised and supervised learning algorithms. Then, we put those into context by surveying model capabilities around them, from training, evaluating, and explaining the models, to using model artifacts to retrain, score and inspect models.
With this map of the landscape firmly in hand, we...