Executing Scalding in a Hadoop cluster
Deploying an application requires using a build tool to package our application into a jar file and copying it to a client node of the Hadoop cluster. The process of execution is straightforward and is very similar to submitting any JAR file for execution on a Hadoop cluster, as shown in the following command:
$ hadoop jar myjar.jar com.twitter.scalding.Tool mypackage.MyJob–-hdfs –-input /data/set1/ --output /output/res1/
The submitted job has the same permissions in HDFS as the user that submitted the job. If the read and write permissions are satisfied, it will process the input and store the resulting data.
Note
Scalding applications, when storing in HDFS, write data to the output folder defined in a sink in our job. Any existing content on that folder is purged every time a job begins its execution.
Internally, the JAR file is submitted to the JobTracker service that orchestrates the execution of the map and reduce phases. The actual tasks are executed...