Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Real-World SRE

You're reading from   Real-World SRE The Survival Guide for Responding to a System Outage and Maximizing Uptime

Arrow left icon
Product type Paperback
Published in Aug 2018
Publisher Packt
ISBN-13 9781788628884
Length 340 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Nat Welch Nat Welch
Author Profile Icon Nat Welch
Nat Welch
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction FREE CHAPTER 2. Monitoring 3. Incident Response 4. Postmortems 5. Testing and Releasing 6. Capacity Planning 7. Building Tools 8. User Experience 9. Networking Foundations 10. Linux and Cloud Foundations Other Books You May Enjoy Index

Analyzing past postmortems


Once everything is said and done, it is good to go back and review past postmortems. Once a quarter, or once a year, collect all of the postmortems and try and pull together some metrics. These metrics can help to give you an insight into what your team is doing to respond to incidents:

  • Time to recovery

  • Time between failures

  • Number of alerts fired versus postmortems generated

  • Number of alerts fired per on-call rotation

MTTR and MTBF

Outside of incidents, two metrics that are often talked about are mean time to recovery (MTTR) and mean time between failures (MTBF). Looking at these numbers across a year can show how your ability to respond to incidents is improving or changing. Note how the goal is to minimize the time until recovery, not necessarily to minimize the time until the cause of the outage is fixed. If MTBF is low, it might mean that your team is not investing in testing enough, and this is also probably draining your team. If MTTR is high, it probably means...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image