ApplicationMaster failures
To recover the application's state after its restart because of an ApplicationMaster failure is the responsibility of the ApplicationMaster itself. When the ApplicationMaster fails, the ResourceManager simply starts another container with a new ApplicationMaster running in it for another application attempt. It is the responsibility of the new ApplicationMaster to recover the state of the older ApplicationMaster, and this is possible only when ApplicationMasters persist their states in the external location so that it can be used for future reference. Any ApplicationMaster can run any application from scratch instead of recovering its state and rerunning again.
For example, an ApplicationMaster can recover its completed jobs. However, if the jobs that are running and completed during the ApplicationMaster's recovery time frame get halted for some reason, their state will be discarded and the ApplicationMaster will simply rerun them from scratch.
The YARN framework...