Running the examples
The source code of all examples is available at https://github.com/learninghadoop2/book-examples.
Gradle (http://www.gradle.org/) scripts and configurations are provided to compile most of the Java code. The gradlew
script included with the example will bootstrap Gradle and use it to fetch dependencies and compile code.
JAR files can be created by invoking the jar
task via a gradlew
script, as follows:
./gradlew jar
Jobs are usually executed by submitting a JAR file using the hadoop jar
command, as follows:
$ hadoop jar example.jar <MainClass> [-libjars $LIBJARS] arg1 arg2 … argN
The optional -libjars
parameter specifies runtime third-party dependencies to ship to remote nodes.
Note
Some of the frameworks we will work with, such as Apache Spark, come with their own build and package management tools. Additional information and resources will be provided for these particular cases.
The copyJar
Gradle task can be used to download third-party dependencies into build/libjars/<example>/lib
, as follows:
./gradlew copyJar
For convenience, we provide a fatJar
Gradle task that bundles the example classes and their dependencies into a single JAR file. Although this approach is discouraged in favor of using –libjar
, it might come in handy when dealing with dependency issues.
The following command will generate build/libs/<example>-all.jar
:
$ ./gradlew fatJar