H2O data capabilities during model building
Recall that H2O model building at scale is performed by using H2O 3 or its extension, Sparkling Water, which wraps H2O 3 with Spark capabilities. The H2O 3 API has extensive data capabilities used in the model building process, and the Sparkling Water API inherits these and adds additional capabilities from Spark. These capabilities are broken down into the following three broad categories:
- Ingesting data from the source to the H2O cluster
- Manipulating data on the H2O cluster
- Exporting data from the H2O cluster to an external destination
As emphasized in previous chapters, the H2O cluster architecture (H2O 3 or Sparkling Water) allows model building at an unlimited scale but is abstracted from the data scientist who builds models by coding H2O in the IDE.
H2O data capabilities are overviewed in the following diagram and elaborated subsequently: