Implementing retry and circuit breaker policies
Services fail for various reasons. A typical response to a service failure is an HTTP response in the 5xx range. These typically highlight an issue with the hosting server or a temporary outage in the network hosting the service. Without trying to pinpoint the exact cause of the failure at the time it happens, we need to add some fail-safes to ensure the continuity of the application when these types of errors occur.
For this reason, we should use retry logic in our service calls. These will automatically resubmit the initial request if an error code is returned, which might be enough time for a transient error to resolve itself and reduce the effects that the initial error might have on the entire system and operation. In this policy, we generally allow for some time to pass between each request attempt. This sums up our retry policy.
What we don’t want to do with our retries is to continue to execute them without some form...