In the preceding recipe, we learned how to prevent software from hanging using watchdog timers. A similar technique can be used to implement a highly available system, which consists of one or more software or hardware components that can perform the same function. If one of the components fails, another one can take over.
The component that is currently active should periodically advertise its health status to other, passive components using messages that are called heartbeats. When it reports an unhealthy status or doesn't report it within a specific amount of time, a passive component detects it and activates itself. When the failed component recovers, it can either transition into passive mode, monitoring the now active component for failures, or initiate a failback procedure to claim the active status back.
In this recipe...