Job scheduling in YARN
Most cluster resources are multitenant in nature, that is, a number of teams or people share the cluster resources. Allocation of resources to satisfy the needs of all these tenants becomes important and is the responsibility of the scheduler. Individual clusters per team or person is not viable as they render poor utilization.
YARN provides a pluggable model to schedule policies. The initial versions of Hadoop had a simple First in First Out (FIFO) scheduler. However, FIFO was found to be inadequate in dealing with the complexities of multitenancy. We will discuss two other scheduling strategies that are used in Hadoop today, CapacityScheduler and FairScheduler.
CapacityScheduler
The concept behind CapacityScheduler is to guarantee a tenant-promised capacity on a shared cluster. If other tenants utilize less than the requested capacity, the scheduler allows the tenant to tap into these unused resources. The number one goal of CapacityScheduler is not to allow a single...