Preparation – chaos engineering
On September 20, 2015, Amazon Web Services (AWS) experienced an outage with more than 20 services out of its data centers in the US-EAST-1 region. These services affected applications from major companies such as Tinder, Airbnb, and IMDb, as well as Amazon’s own services such as Alexa.
One of AWS’s customers that was able to avoid problems during the outage and remain fully operational was Netflix, the streaming service. It was able to do so because it created a series of tools that it called the Simian Army, discussed in this blog article at https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116, which simulated potential problems with AWS so that Netflix engineers could design ways to make their system more resilient.
Over several AWS outages, the Simian Army proved its worth, allowing Netflix to continue providing service. Soon, other companies such as Google started wanting to apply the same techniques. This groundswell...