Application resiliency using timeouts and retries
With communication between multiple microservices, several things can go wrong, network and infrastructure being the most common causes of service degradation and outages. A service too slow to respond can cause cascading failures across other services and have a ripple effect across the whole application. So, microservices design must be prepared for unexpected delays by setting timeouts when sending requests to other microservices.
The timeout is the amount of time for which a service can wait for a response from other services; beyond the timeout duration, the response has no significance to the requestor. Once a timeout happens, the microservices will follow contingency methods, which may include servicing the response from the cache or letting the request gracefully fail.
Sometimes, issues are transient, and it makes sense to make another attempt to get a response. This approach is called a retry, where a microservice can...