Investigating versus fixing
When we start working to deal with an outage, data loss, or other disaster, the natural inclination is to focus on finding a root cause, fixing that root cause, and then getting systems back into a working state. It makes sense, it is the obvious course of events, and it is emotionally satisfying to work through the process.
The problem with this process is that it is based on a few flawed beliefs. It is a method derived from things like getting your car or house repaired after there is damage or an accident. The underlying principle being that the object or system in question is very expensive to acquire and in relative terms, cheap to repair.
It also focuses on the value of determining why something has occurred over the value of getting systems up and running again. The assumption is that if something has happened once that it is expected to happen again and that by knowing what has failed and why that we will be able to avoid the almost inevitable...