Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Becoming a Rockstar SRE

You're reading from   Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Arrow left icon
Product type Paperback
Published in Apr 2023
Publisher Packt
ISBN-13 9781803239224
Length 420 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Jeremy Proffitt Jeremy Proffitt
Author Profile Icon Jeremy Proffitt
Jeremy Proffitt
Rod Anami L. Anami Rod Anami L. Anami
Author Profile Icon Rod Anami L. Anami
Rod Anami L. Anami
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Part 1 - Understanding the Basics of Who, What, and Why
2. Chapter 1: SRE Job Role – Activities and Responsibilities FREE CHAPTER 3. Chapter 2: Fundamental Numbers – Reliability Statistics 4. Chapter 3: Imperfect Habits – Duct Tape Architecture and Spaghetti Code 5. Part 2 - Implementing Observability for Site Reliability Engineering
6. Chapter 4: Essential Observability – Metrics, Events, Logs, and Traces (MELT) 7. Chapter 5: Resolution Path – Master Troubleshooting 8. Chapter 6: Operational Framework – Managing Infrastructure and Systems 9. Chapter 7: Data Consumed – Observability Data Science 10. Part 3 - Applying Architecture for Reliability
11. Chapter 8: Reliable Architecture – Systems Strategy and Design 12. Chapter 9: Valued Automation – Toil Discovery and Elimination 13. Chapter 10: Exposing Pipelines – GitOps and Testing Essentials 14. Chapter 11: Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes 15. Chapter 12: Final Exam – Tests and Capacity Planning 16. Part 4 - Mastering the Outage Moments
17. Chapter 13: First Thing – Runbooks and Low Noise Outage Notifications 18. Chapter 14: Rapid Response – Outage Management Techniques 19. Chapter 15: Postmortem Candor – Long-Term Resolution 20. Part 5 - Looking into Future Trends and Preparing for SRE Interviews
21. Chapter 16: Chaos Injector – Advanced Systems Stability 22. Chapter 17: Interview Advice – Hiring and Being Hired 23. Index 24. Other Books You May Enjoy Appendix A – The Site Reliability Engineer Manifesto 1. Appendix B – The 12-Factor App Questionnaire

Describing an SRE’s main responsibilities

We hope the SRE job role mission and scope are less foggy at this point. As an SRE, what would you be responsible for? In this section, we will investigate the most trivial duties that SREs are accountable for. We’ve divided these responsibilities into two sections:

  • Operational work responsibilities
  • Engineering work responsibilities

Let’s start by reviewing the operational group first.

Operational work responsibilities

Site reliability engineers have work duties related to the process of managing systems. Such tasks are called operational work. SREs are not just accountable for operational work together with the operations team, but they also have the authority to execute their management processes.

First, they are responsible for the ITIL® processes, including incident, problem, and change management. That means they actively participate in on-call schedules for critical services downtime as first responders. They need to isolate the faulty components of the service, troubleshoot the causes of the component issues, repair them or provide a workaround, reestablish the affected service to nominal performance, and verify whether the service has been restored from the user’s perspective. After significant service disruptions, SREs must determine their root causes and contributing factors. They implement change requests to the systems, backing services, delivery pipelines, integrations, infrastructure, and applications.

Second, they are accountable for maintaining systems, services, applications, and infrastructure. They may need to patch a bug into production or assist the development team. SREs may have to deploy a new software version using a canary release, A/B testing, or blue-green deployment.

Third, SREs have the responsibility of taking care of the observability platform. That includes installing, configuring, maintaining, and monitoring the observability tools. Yes, we monitor the monitoring.

Engineering work responsibilities

SREs do engineering work to reach higher levels of availability, resiliency, performance, quality, and scalability on a system. They work on each configuration item or component to increase its reliability. The overall system delivers more trustable services and SLOs by handling each component reliability index.

Site reliability engineers are responsible for reliability metrics, such as the mean time to detect (MTTD) and mean time to repair (MTTR). MTTD indicates how fast the monitoring system can detect a service problem or an anomaly that will lead to a problem if nothing is done. MTTR indicates how swiftly an incident is repaired after it’s detected. Those metrics make SREs accountable for the effectiveness of the observability platform and tools, and the runbooks documentation.

The mean time between failure (MTBF) is another reliability metric under SRE accountability. That indicates how much time it takes for a system failure. SREs must adopt the blameless postmortems principle to improve this metric every time a failure happens. And that translates to multiple reliability enhancements to different parts of the system as a result of these postmortems.

SREs are accountable for toil management. The less toil we have in systems management, the better the metrics mentioned previously. Site reliability engineers work tirelessly to detect and eliminate repetitive tasks devoid of business value.

We described the ordinary responsibilities of an SRE with the intent of giving you an idea of what to expect in this career. Of course, this is not a comprehensive list of duties or intended to suggest a constraint to their responsibilities. As long as they work to fulfill the guiding principles, they are doing SRE work. We are going to review which activities SREs execute daily next.

You have been reading a chapter from
Becoming a Rockstar SRE
Published in: Apr 2023
Publisher: Packt
ISBN-13: 9781803239224
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime