You're reading from Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Product type Paperback

Published in Apr 2023

Publisher Packt

ISBN-13 9781803239224

Length 420 pages

Edition 1st Edition

Languages

Python

Tools

Argo CD

Concepts

DevOps

Authors (2):

Jeremy Proffitt

Rod Anami L. Anami

View More author details

Table of Contents (27) Chapters

Preface

1. Part 1 - Understanding the Basics of Who, What, and Why

2. Chapter 1: SRE Job Role – Activities and Responsibilities FREE CHAPTER

3. Chapter 2: Fundamental Numbers – Reliability Statistics

4. Chapter 3: Imperfect Habits – Duct Tape Architecture and Spaghetti Code

5. Part 2 - Implementing Observability for Site Reliability Engineering

6. Chapter 4: Essential Observability – Metrics, Events, Logs, and Traces (MELT)

7. Chapter 5: Resolution Path – Master Troubleshooting

8. Chapter 6: Operational Framework – Managing Infrastructure and Systems

9. Chapter 7: Data Consumed – Observability Data Science

10. Part 3 - Applying Architecture for Reliability

11. Chapter 8: Reliable Architecture – Systems Strategy and Design

12. Chapter 9: Valued Automation – Toil Discovery and Elimination

13. Chapter 10: Exposing Pipelines – GitOps and Testing Essentials

14. Chapter 11: Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes

15. Chapter 12: Final Exam – Tests and Capacity Planning

16. Part 4 - Mastering the Outage Moments

17. Chapter 13: First Thing – Runbooks and Low Noise Outage Notifications

18. Chapter 14: Rapid Response – Outage Management Techniques

19. Chapter 15: Postmortem Candor – Long-Term Resolution

20. Part 5 - Looking into Future Trends and Preparing for SRE Interviews

21. Chapter 16: Chaos Injector – Advanced Systems Stability

22. Chapter 17: Interview Advice – Hiring and Being Hired

23. Index

Why subscribe?

24. Other Books You May Enjoy

Appendix A – The Site Reliability Engineer Manifesto

1. Appendix B – The 12-Factor App Questionnaire

To get the most out of this book

We purposefully used SRE as the acronym for site reliability engineer and kept site reliability engineering in its extended form throughout the book. For us, site reliability engineering is only accomplishable if you have an SRE and not the other way around. Although it’s common to see SRE standing for both site reliability engineer and engineering interchangeably, we want to emphasize the persona and the who in this book.

This book contains simulation labs to give its readers practical knowledge. Each has a prerequisite knowledge set, such as Kubernetes, cloud computing, or software development. It’s not part of this book to teach you about specific technologies and products but the most effective practices and principles that are technology agnostic. However, we must adopt some technology to demonstrate the site reliability engineering concepts and techniques. For that, we preferred open source software and platforms with free tier accounts in the labs.

Each simulation lab states its learning requirements and points to where the reader can find more information and instructions. We divided each practical exercise into three parts:

Lab architecture
Lab contents
Lab instructions

The lab architecture explains the big picture around the design and connections among its main components. The contents section explains what’s inside the GitHub repository, such as files and folders. And the lab instructions have a procedure for installing, configuring, and using the lab properly.

The following is a list of software covered in this book’s simulation labs and the required execution environment:

Software covered in the book	Cloud platform requirements
GitHub	Google Cloud Platform account
Kubernetes	AWS account (alternative)
Node.js	Microsoft Azure account (alternative)
Prometheus	Google Kubernetes Engine (GKE)
Grafana	Google Compute Engine (GCE)
Terraform	Amazon Elastic Kubernetes Service (alternative)
Python	Azure Kubernetes Service (alternative)
Golang
Slack
PagerDuty
Grype
Syft
Argo CD
Grafana k6
LitmusChaos

You will require a laptop with reasonable access to the internet to work in the book’s labs.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

The rest of the chapter is locked

You're reading from Becoming a Rockstar SRE Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Table of Contents (27) Chapters

To get the most out of this book

Unlock this book and the full library FREE for 7 days

Authors (2)

Personalised recommendations for you