Measuring metrics that matter
To measure where you are on your transformation journey, it's best to focus on the four metrics that are used in DORA—two for performance and two for stability, as follows:
- Delivery performance metrics:
- Delivery lead time
- Deployment frequency
- Stability metrics:
- Mean time to restore
- Change fail rate
Delivery lead time
The delivery lead time (DLT) is the time from when your engineers start working on a feature until the feature is available to the end users. You could say from code commit to production—but you normally start the clock when the team starts to work on a requirement and changes the state of it to doing or something similar.
It is not easy to get this metric automated from the system. I will show you in Chapter 7, Running Your Workflows, how you can use GitHub Actions and Projects together to automate the metric. If you don't get the metric out of the system, you can set up a survey with the following options:
- Less than 1 hour
- Less than 1 day
- Less than 1 week
- Less than 1 month
- Less than 6 months
- More than 6 months
Depending on where you are on the scale, you conduct the survey more or less often. Of course, system-generated values would be preferable, but if you are on the upper steps of that scale (months), it doesn't matter. It gets more interesting if you measure hours or days.
Why not lead time?
From a Lean management perspective, the LT would be the better metric: how long does a learning from customer feedback flow through the entire system? But requirements in software engineering are difficult. Normally, a lot of steps are involved before the actual engineering work begins. The outcome could vary a lot and the metric is hard to guess if you must rely on survey data. Some requirements could stay for months in the queue—some, only a few hours. From an engineering perspective, it's much better to focus on DLT. You will learn more about LT in Chapter 18, Lean Product Development and Lean Startup.
Deployment frequency
The deployment frequency focuses on speed. How long does it take to deliver your changes? A metric that focuses more on throughput is the DF. How often do you deploy your changes to production? The DF indicates your batch size. In Lean manufacturing, it is desirable to reduce the batch size. A higher DF would indicate a smaller batch size.
At first glance, it looks easy to measure DF in your system. But at a closer look, how many of your deployments really make it to production? In Chapter 7, Running Your Workflows, I will explain how you can capture the metric using GitHub Actions.
If you can't measure the metric yet, you can also use a survey. Use the following options:
- On-demand (multiple times per day)
- Between once per hour and once per day
- Between once per day and once per week
- Between once per week and once per month
- Between once per month and once every 6 months
- Less than every 6 months
Mean time to restore
A good measure for stability is the mean time to restore (MTTR). This measures how long it takes to restore your product or service if you have an outage. If you measure your uptime, it is basically the time span in which your service is not available. To measure your uptime, you can use a smoke test—for example, in Application Insights (see https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability). If your application is installed on client machines and not accessible, it's more complicated. Often, you can fall back on the time for a specific ticket type in your helpdesk system.
If you can't measure it at all, you can still fall back to a survey with the following options:
- Less than 1 hour
- Less than 1 day
- Less than 1 week
- Less than 1 month
- Less than 6 months
- More than 6 months
But this should only be the last resort. The MTTR should be a metric you should easily get out of your systems.
Change fail rate
As with DLT for performance, MTTR is the metric for time when it comes to stability. The pendant of DF that focuses on throughput is the change fail rate (CFR). For the question How many of your deployments cause a failure in production?, the CFR is specified as a percentage. To decide which of your deployments count toward this metric, you should use the same definition as for the DF.
The Four Keys dashboard
These four metrics based upon the DORA research are a great way to measure where you are on your DevOps journey. They are a good starting point to change your conversations with management. Put them on a dashboard and be proud of them. And don't worry if you're not yet an elite performer—the important thing is to be on the journey and to improve continuously.
It's very simple to start with survey-based values. But if you want to use automatically generated system data you can use the Four Keys Project to display the data in a nice dashboard, (see Figure 1.4).
The project is open source and based upon Google Cloud (see https://github.com/GoogleCloudPlatform/fourkeys), but it depends on webhooks to get the data from your tools. You will learn in Chapter 7, Running Your Workflows, how to use webhooks to send your data to the dashboard.
What you shouldn't do
It is important that these metrics are not used to compare teams with each other. You can aggregate them to get an organizational overview, but don't compare individual teams! Every team has different circumstances. It's only important that the metrics evolve in the right direction.
Also, the metrics should not become the goal. It is not desirable to just get better metrics. The focus should always be on the capabilities that lead to these metrics and that we discuss in this book. Focus on these capabilities and the metrics will follow.