Scalding execution throttling
Scalding execution throttling is a Hadoop-specific trick. It makes sense to highlight it here as we may read billions of rows of data when running Scalding applications in production.
For resource management, Hadoop offers a number of schedulers. Each cluster has a specific capacity, for example 600 simultaneous map tasks and 300 reduce tasks. The most common scheduler used in Hadoop is the Fair Scheduler. It attempts to assign resources to jobs so that in average they get an equal amount of resources.
There are occasions, however, when we will want to protect some resources for business critical jobs, or we will want to throttle some job. Sometimes, we may need to limit resources to newer members of the team, or limit resources on a new beta release of an application.
For this, we can access the JobTracker using ssh
and add a new pool in the file fair-scheduler.xml
, as shown in the following code:
<pool name="staging_pool"> <maxMaps>50</maxMaps...