Retry and circuit breaker
In this section, we discuss one of the most interesting topics to date: the retry and circuit breaker pattern. It would be great to get familiar with this concept before proceeding to implementing a production cluster.
Retry
The problem solved by retry and circuit breaker stems from cascade failures caused by a service or a function inside a chain of calling becoming unavailable. In the following figure, we assume that five different functions or services have 99% availability, so they will fail once every 100 calls. The client observing this service's chain will experience the availability of A at only 95.09%:
Figure 7.5: A chain of functions or microservices would make their overall availability lower
What does this imply? It means that when this chain becomes eight functions long, the availability will become 92.27%, and if it's 20 functions long, this figure will decrease to 81.79%. To reduce the failure rate, we should retry calling to another instance of function...