Building your Spark job with Maven
Maven is an open source Apache project that builds the Spark jobs in Java or Scala. As of Version 2.0.0, the building Spark site states that Maven is the official recommendation for packaging Spark and is the "build of reference" too. As with sbt
, you can include the Spark dependency through Maven Central, simplifying our build process. Also, similar to sbt
is the ability of Spark and all of our dependencies to put everything in a single JAR file using a plugin or build Spark as a monolithic JAR file using the sbt/sbt
assembly for inclusion.
To illustrate the build process for the Spark jobs with Maven, this section will use Java as an example, as Maven is more commonly used to build the Java tasks. As a first step, let's take a Spark job that already works and go through the process of creating a build file for it. We can start by copying the GroupByTest
example into a new directory and generating the Maven template, as shown here:
mkdir example-java-build...