You're reading from Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Product type Paperback

Published in Apr 2023

Publisher Packt

ISBN-13 9781803239224

Length 420 pages

Edition 1st Edition

Languages

Python

Tools

Argo CD

Concepts

DevOps

Authors (2):

Jeremy Proffitt

Rod Anami L. Anami

View More author details

Table of Contents (27) Chapters

Preface

1. Part 1 - Understanding the Basics of Who, What, and Why

2. Chapter 1: SRE Job Role – Activities and Responsibilities FREE CHAPTER

3. Chapter 2: Fundamental Numbers – Reliability Statistics

4. Chapter 3: Imperfect Habits – Duct Tape Architecture and Spaghetti Code

5. Part 2 - Implementing Observability for Site Reliability Engineering

6. Chapter 4: Essential Observability – Metrics, Events, Logs, and Traces (MELT)

7. Chapter 5: Resolution Path – Master Troubleshooting

8. Chapter 6: Operational Framework – Managing Infrastructure and Systems

9. Chapter 7: Data Consumed – Observability Data Science

10. Part 3 - Applying Architecture for Reliability

11. Chapter 8: Reliable Architecture – Systems Strategy and Design

12. Chapter 9: Valued Automation – Toil Discovery and Elimination

13. Chapter 10: Exposing Pipelines – GitOps and Testing Essentials

14. Chapter 11: Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes

15. Chapter 12: Final Exam – Tests and Capacity Planning

16. Part 4 - Mastering the Outage Moments

17. Chapter 13: First Thing – Runbooks and Low Noise Outage Notifications

18. Chapter 14: Rapid Response – Outage Management Techniques

19. Chapter 15: Postmortem Candor – Long-Term Resolution

20. Part 5 - Looking into Future Trends and Preparing for SRE Interviews

21. Chapter 16: Chaos Injector – Advanced Systems Stability

22. Chapter 17: Interview Advice – Hiring and Being Hired

23. Index

Why subscribe?

24. Other Books You May Enjoy

Appendix A – The Site Reliability Engineer Manifesto

1. Appendix B – The 12-Factor App Questionnaire

Eliminating toil

Site reliability engineering disciplines fill the systems management gaps left by the increased complexity of solutions in a hybrid multiple-cloud infrastructure environment. Complexity intrinsically hinders the scalability and reliability of systems by inserting unnecessary burdens in all operations. SREs were born to keep things simple by eliminating repetitive tasks, which is one of their fundamental purposes. To understand how SREs accomplish this mission, we’ll divide this section into three parts:

Toil redefined
Why toil is bad
Handling toil the right way

Next, we’ll redefine what toil is in the site reliability engineering context.

Toil redefined

Google defines toil as “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” For a long time, we used this definition to target...

The rest of the chapter is locked

You're reading from Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Table of Contents (27) Chapters

Eliminating toil

Toil redefined

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you