Measuring the downtime with the MTTR
The MTTR is the average amount of time an issue takes to be resolved. It is generated from the average of that time span. The MTTR is often thought of as response time – how effectively a fire department can get to your fire and put it out. It is also the amount of time we are often impacted by each outage, so when the MTTR goes up, we often see a decrease in revenue and customer satisfaction. The MTTR has multiple smaller elements inside of it, each contributing to the overall outage time. Let’s step through a typical outage and quickly examine each of these elements:
- Detection time: The time between the outage start and when someone noticed it. This often starts with the root cause and measures up until the first person or automated notification says that something is wrong.
- Notification time: The time it takes between detection and when engineering assets first respond. This could be the time it takes for someone to...