Building and running standalone programs
So far, we have interacted exclusively with Spark through the Spark shell. In the section that follows, we will build a standalone application and launch a Spark program either locally or on an EC2 cluster.
Running Spark applications locally
The first step is to write the build.sbt
file, as you would if you were running a standard Scala script. The Spark binary that we downloaded needs to be run against Scala 2.10 (You need to compile Spark from source to run against Scala 2.11. This is not difficult to do, just follow the instructions on http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211).
// build.sbt file name := "spam_mi" scalaVersion := "2.10.5" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.4.1" )
We then run sbt package
to compile and build a jar of our program. The jar will be built in target/scala-2.10/
, and called spam_mi_2.10-0.1-SNAPSHOT.jar
. You can try this with the example code provided...