Building systems with an emergency stop
Systems are going to run amok. This is a simple truth that you need to come to terms with early in infrastructure tooling development.
When you are a small company, there is usually a very small group of people who understand the systems well and watch over any changes to handle problems. If those people are good, they can quickly respond to a problem. Usually, these people are the developers of the software.
As companies start to grow, jobs begin to become more specialized. The larger the company, the more specialized the jobs. As that happens, the first responders to major issues don't have the access or knowledge to deal with these problems.
This can create a critical gap between recognition of a major problem and stopping the problem from getting worse.
This is where the ability to allow first responders to stop changes comes into play. We call this an emergency-stop ability.
Understanding emergency stops
There are...