Using overload prevention mechanisms
When you have a small set of services, misbehaving applications generally cause small problems. This is because there is usually an overabundance of network capacity to absorb badly behaving applications within a data center, and with a small set of services, it is usually intuitive to figure out what would cause the issue.
When you have a large number of applications running, your network and your machines are usually oversubscribed. Oversubscribed means that your network and systems cannot handle all your applications running at 100%. Oversubscription is common in networks or clusters to control costs. This works because, at any given time, most applications ebb and flow with network traffic, central processing unit (CPU), and memory.
An application that suddenly experiences some type of bug can go into retry loops that quickly overwhelm a service. In addition, if some catastrophic event occurs that takes a service offline, trying to bring...