So far, we've focused mostly on the concurrency aspect of processes. However, processes are also used to create fault-tolerant and reliable systems that can continue to operate even in the presence of errors.
To have fault-tolerant applications, you must first recognize the existence of failures, most of them being unexpected. These failures range from one of our dependencies being down (such as a database) to having hardware failures. Moreover, if you're running a distributed system, you can experience other issues, such as a remote machine becoming unavailable, or being in the presence of a network partition. Regardless of the cause, these failures must be detected, so that we can limit their impact and hopefully recover from it without human intervention.
It's virtually impossible to anticipate all the possible...