Let me begin this section with a question! How confident are you about the quality of your current software stack? If your answer happens to be something along the lines of, I don't really know until I make it fail, then we are in total agreement! If not, let me introduce you to the concept of chaos testing.
Chaos testing is a term that was initially coined by the engineering team at Netflix. The key point behind chaos testing is to evaluate your system's behavior when various components exhibit different types of failure. So, what kinds of failure are we talking about here? Here are a few interesting examples, ordered by their relative severity (low to high):
- A service fails to reach another service it depends on
- Calls between services exhibit high latency/jitter
- Network links experience packet...