Recent developments in YARN architecture
The ResourceManager is a single point of failure and restart because of various reasons: bugs, hardware failure, deliberate downtime for upgrading, and so on.
We already saw how crucial the role of the ResourceManager in YARN architecture is. The ResourceManager has become a single point of failure; if the ResourceManager in a cluster goes down, everything on that cluster will be lost.
So in a recent development of YARN, ResourceManager HA became a high priority. This recent development of YARN not only covers ResourceManager HA, but also provides transparency to users and does not require them to monitor such events explicitly and resubmit the jobs.
Overly complex in MRv1 for the fact that JobTracker has to save too much of meta-data: both cluster state and per-application running state. This means that if Job-Tracker dies, then all the applications in a running state will be lost.
The development of ResourceManager recovery will be done in two phases...