Pointing to an external Spark Cluster
Running Zeppelin with built-in Spark is all good, but in most of our cases, we'll be executing the Spark jobs initiated by Zeppelin on a cluster of workers. Achieving this is pretty simple: we need to configure Zeppelin to point its Spark master property to an external Spark master URL. Let's take for example a simple and standalone external Spark cluster running on my local machine. Please note that we will have to run Zeppelin on a different port because of the Zeppelin UI port's conflict with the Spark standalone cluster master web UI over 8080
.
Let's bring up the Spark Cluster. From inside your Spark source, execute the following:
sbin/start-all.sh
How to do it…
Finally, let's modify
conf/interpreter.json
andconf/zeppelin-env.sh
to point themaster
property to the host on which the Spark VM is running. In this case, it will be my localhost, with the port being7077
, which is the default master port:The
conf/interpreter.json
file looks like the following...