Configuring using Hadoop parameters
There are many Hadoop configuration parameters that can be tuned at job execution. A set of default values is assigned at execution time, based on Hadoop configuration files. We can, however, overwrite the default values.
We can, for example, set the amount of memory allocated to each map and reduce the task of that job as well as the default number of reduce tasks per job. Note that all Hadoop parameters have to be added right after com.twitter.scalding.Tool
, as in the following example:
$ hadoop jar myjar.jar com.twitter.scalding.Tool \ -D mapred.child.java.opts=-Xmx2048m \ -D mapred.reduce.tasks=20 \ com.company.myclass \ --hdfs --input $input --output $output
Perform a search on the web for map reduce client default values to find out more information about the available Hadoop parameters that can be used.