The components of H2O machine learning at scale
As introduced in the previous chapter and emphasized throughout this book, H2O machine learning overcomes problems of scale. The following is a brief introduction of each component of H2O machine learning at scale and how each overcomes these challenges.
H2O Core – in-memory distributed model building
H2O Core allows a data scientist to write code to build models using well-known machine learning algorithms. The coding experience is through an H2O API expressed in Python, R, or Java/Scala language and written in their favorite client or IDE, for example Python in a Jupyter notebook. The actual computation of model building, however, takes place on an enterprise server cluster (not the IDE environment) and leverages the server cluster's vast pool of memory and CPUs needed to run machine learning algorithms against massive data volumes. Â
So, how does this work? First, data used for model building is partitioned...