Machine learning with H2O and Spark
H2O is an open source system for machine learning. It offers a rich set of machine learning algorithms and a web-based data processing user interface. It offers the ability to develop machine learning applications in Java, Scala, Python, and R. It also has the ability to interface with Spark, HDFS, Amazon S3, SQL, and NoSQL databases. H2O also provides an H2O Flow, which is an IPython-like notebook that allows you to combine code execution, text, mathematics, plots, and rich media into a single document. Sparkling Water is a product of H2O on Spark.
Why Sparkling Water?
Sparkling Water combines the best of both worlds of Spark and H2O:
Spark provides the best APIs, RDDs, and multitenant contexts
H2O provides speed, columnar-compression, machine learning, and deep learning algorithms
Both Spark and H2O Contexts reside in a shared executor JVM and shared Spark RDDs and H2O RDDs
An application flow on YARN
The steps involved in a Sparkling Water application submitted...