Resource Manager (RM) is the single point of failure in a YARN cluster as every request from a client goes through it. The Resource Manager also acts as a central system to allocate resources for various tasks. The failure of the resource manager will lead to failure of YARN and thus a client cannot obtain any information about the YARN cluster or a client cannot submit any application for execution. Therefore, it is important to implement high availability of Resource Manager to prevent any cluster failure. The following are a few important considerations for high availability:
- Resource Manager state: It is very important to persist a resource manager state, which if stored in memory may be lost upon resource manager failure. If the state of the Resource Manager is available even after failure, we can restart the Resource Manager from...