Chapter 1. Need for YARN
YARN stands for Yet Another Resource Negotiator. YARN is a generic resource platform to manage resources in a typical cluster. YARN was introduced with Hadoop 2.0, which is an open source distributed processing framework from the Apache Software Foundation.
In 2012, YARN became one of the subprojects of the larger Apache Hadoop project. YARN is also coined by the name of MapReduce 2.0. This is since Apache Hadoop MapReduce has been re-architectured from the ground up to Apache Hadoop YARN.
Think of YARN as a generic computing fabric to support MapReduce and other application paradigms within the same Hadoop cluster; earlier, this was limited to batch processing using MapReduce. This really changed the game to recast Apache Hadoop as a much more powerful data processing system. With the advent of YARN, Hadoop now looks very different compared to the way it was only a year ago.
YARN enables multiple applications to run simultaneously on the same shared cluster and allows applications to negotiate resources based on need. Therefore, resource allocation/management is central to YARN.
YARN has been thoroughly tested at Yahoo! since September 2012. It has been in production across 30,000 nodes and 325 PB of data since January 2013.
Recently, Apache Hadoop YARN won the Best Paper Award at ACM Symposium on Cloud Computing (SoCC) in 2013!