Summary
YARN has opened up the Hadoop ecosystem to a wide range of applications. It has not only alleviated scaling bottlenecks that were present in traditional MapReduce-based Hadoop but also aided in improving infrastructure efficiency in an organization. This was made possible by:
- Separating out application-specific logic from resource management. The ResourceManager is solely responsible for cluster resource management and is agnostic of any application.
- Providing common and generic abstractions for resource specifications. Resources are specified in terms of cores and memory.
- Maintaining backward compatibility with existing Hadoop APIs. Existing Hadoop programs work on YARN on recompilation, without any code changes.
- Providing a variety of pluggable scheduling policies such as FairScheduler and CapacityScheduler. Pluggable policies make it easy for other paradigms to come onboard.
Development of newer computing paradigms on Hadoop is as simple as implementing a client and Application Master...