Now it's time to fire up a Spark cluster which will give us all the functionality of Spark while simultaneously allowing us to use H2O algorithms and visualize our data. As always, we must download Spark 2.1 distribution from http://spark.apache.org/downloads.html and declare the execution environment beforehand. For example, if you download spark-2.1.1-bin-hadoop2.6.tgz from the Spark download page, you can prepare the environment in the following way:
tar -xvf spark-2.1.1-bin-hadoop2.6.tgz export SPARK_HOME="$(pwd)/spark-2.1.1-bin-hadoop2.6
When the environment is ready, we can start the interactive Spark shell with Sparkling Water packages and this book package:
export SPARKLING_WATER_VERSION="2.1.12" export SPARK_PACKAGES=\ "ai.h2o:sparkling-water-core_2.11:${SPARKLING_WATER_VERSION},\ ai.h2o:sparkling-water-repl_2.11:$...