Design for failure
Anything that can go wrong will go wrong.
When we are building microservices, we should always be prepared for failure. There are many reasons for this, but the main one is that cloud computing networks can be flakey and you lose the ability to tune switching and routing, which would have given you an optimized system if you were running them in your data center. In addition to this, we tend to build microservice architectures to scale automatically, and this scaling causes services to start and stop in unpredictable ways.
What this means for our software is that we need to think about this failure up front while discussing upcoming features. We then need to design this into the software from the beginning, and as engineers, we need to understand these problems.
In his book Designing Data-Intensive Applications, Martin Kleppman makes the following comment:
The bigger a system gets, the more likely it is that one of its components is broken. Over time, broken things get fixed...