The circuit breaker pattern is a very useful tool for this. It allows us to quickly notice that a service is unable to process requests, so the calls to it can be short-circuited. This can happen both close to the callee (Envoy provides such a capability), or on the caller side (with the additional benefit of shaving off time from the calls). In Envoy's case, it can be as simple as adding the following to your config:
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_requests: 1000
max_pending_requests: 1000
In both cases, the load caused by the calls to the service may drop, which in some cases can help the service return to normal operation.
How do we implement a circuit breaker on the caller side? Once you've made a few calls, and, say, your leaky bucket overflows, you can just stop making new calls for a specified period of time (for example, until the bucket no longer overflows). Simple and effective.