Recovering the system
One thing that we have yet to touch on is how to debug your application and do the technical work of responding to an incident. This is the third pillar of the initial three pillars we mentioned when defining incident response. We were alerted that things were not great. We communicated that we were on the case. Now we need to make things better.
How do we do that? We will be talking about measuring mean time to recovery (MTTR) in Chapter 4, Postmortems, but the strategy that we kept mentioning earlier in this chapter was bringing the system back to a working state. That's because you don't necessarily want to immediately go into bug-hunting mode. Instead, you want to find what has changed in the system and revert back. Let us walk through the common first steps in trying to track down a broken system.
Step zero is to take a deep breath. Force yourself to slow down a little. I prefer to count to six while inhaling, count to six again while holding my breath...