Ensuring H2O model reproducibility
In a laboratory or experimental setting, repeating a process under the same protocols and conditions should lead to similar results. Natural variability may of course occur, but this can be measured and attributed to appropriate factors. This is termed repeatability. The enterprise data scientist should ensure that their model builds are well coded and sufficiently documented to make the process repeatable.
Reproducibility in the context of model building is a much stronger condition: the results when a process is repeated must be identical. From a regulatory or compliance perspective, reproducibility may be required.
At a high level, reproducibility requires the same hardware, software, data, and settings. Let's review this specifically for H2O setups. We begin with two cases depending on the H2O cluster type.
Case 1 – Reproducibility in single-node clusters
A single-node cluster is the simplest H2O hardware configuration...