Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Becoming a Rockstar SRE

You're reading from   Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Arrow left icon
Product type Paperback
Published in Apr 2023
Publisher Packt
ISBN-13 9781803239224
Length 420 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Jeremy Proffitt Jeremy Proffitt
Author Profile Icon Jeremy Proffitt
Jeremy Proffitt
Rod Anami L. Anami Rod Anami L. Anami
Author Profile Icon Rod Anami L. Anami
Rod Anami L. Anami
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Part 1 - Understanding the Basics of Who, What, and Why
2. Chapter 1: SRE Job Role – Activities and Responsibilities FREE CHAPTER 3. Chapter 2: Fundamental Numbers – Reliability Statistics 4. Chapter 3: Imperfect Habits – Duct Tape Architecture and Spaghetti Code 5. Part 2 - Implementing Observability for Site Reliability Engineering
6. Chapter 4: Essential Observability – Metrics, Events, Logs, and Traces (MELT) 7. Chapter 5: Resolution Path – Master Troubleshooting 8. Chapter 6: Operational Framework – Managing Infrastructure and Systems 9. Chapter 7: Data Consumed – Observability Data Science 10. Part 3 - Applying Architecture for Reliability
11. Chapter 8: Reliable Architecture – Systems Strategy and Design 12. Chapter 9: Valued Automation – Toil Discovery and Elimination 13. Chapter 10: Exposing Pipelines – GitOps and Testing Essentials 14. Chapter 11: Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes 15. Chapter 12: Final Exam – Tests and Capacity Planning 16. Part 4 - Mastering the Outage Moments
17. Chapter 13: First Thing – Runbooks and Low Noise Outage Notifications 18. Chapter 14: Rapid Response – Outage Management Techniques 19. Chapter 15: Postmortem Candor – Long-Term Resolution 20. Part 5 - Looking into Future Trends and Preparing for SRE Interviews
21. Chapter 16: Chaos Injector – Advanced Systems Stability 22. Chapter 17: Interview Advice – Hiring and Being Hired 23. Index 24. Other Books You May Enjoy Appendix A – The Site Reliability Engineer Manifesto 1. Appendix B – The 12-Factor App Questionnaire

An overview of the daily activities of an SRE

Now that we have examined SRE responsibilities, it’s time to check what you, as an SRE, should be performing on a frequent basis. There’s no better way to understand a profession than by asking what someone does in it. When you go to a job interview, you probably want to know the activities a person in that position will carry out. SREs will have a list of assignments as sticky notes on their displays. We have separated those notable activities into two sections:

  • Reactive work activities
  • Proactive work activities

We’ll start by understanding reactive activities.

Reactive work activities

SREs execute many tasks that don’t lift (or shift) system reliability directly; they are usually operational types of work. Nevertheless, those activities either lessen the service downtime or mitigate risks. Examples of jobs that SREs perform daily in this category are as follows:

  • Repair or restore a system or multiple services to their original state
  • Follow and execute instructions from a runbook (standard operating procedure) during an incident to diagnose the application
  • Implement a change request to apply a patch to a software component
  • Attend a meeting to run a postmortem with system administrators and developers about the recent service or system outage
  • Install a new Kubernetes cluster for a new application according to the development team’s specifications and enable monitoring of it
  • Configure a new cloud-based service for a new application following the architecture design and include it in cloud monitoring
  • Deploy a new software release to VMs and execute the testing scripts

Proactive work activities

SREs also carry out jobs that improve the quality, scalability, observability, manageability, resiliency, or availability of a system or service. Since those tasks increase the reliability levels of specific systems or services, they are considered proactive and mostly engineering type of work. Such assignments affect toil and technical debt. Examples of this category are as follows:

  • Maintain a runbook on how to diagnose problems with a specific application
  • Design and develop an automaton to execute procedures previously documented in a runbook automatically
  • Establish, together with the DevOps team, the release strategy, such as a canary release, A/B testing, or blue-green deployment
  • Work with the SWE to add management code to the application so SREs can instruct the application to do self-administration or self-healing operations
  • Work with the development team to adopt an immutable infrastructure philosophy into the application-building process
  • Instrument the application code to increase its observability with logs and traces
  • Design and implement observability to obtain good metrics, events, logs, and traces from a critical application

Note

Site reliability engineers perform many more activities than the ones listed here. This is not a comprehensive list; the only intention is to show you how SREs work across multiple dimensions and aspects of systems and services.

We listed what an SRE does frequently. We wanted to give you a good sense of their day-to-day activities and how it differs from other roles. Again, this is not a complete or closed list. We want to close this chapter by telling you who our SRE rockstars are.

You have been reading a chapter from
Becoming a Rockstar SRE
Published in: Apr 2023
Publisher: Packt
ISBN-13: 9781803239224
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime