You're reading from Google Cloud for DevOps Engineers A practical guide to SRE and achieving Google's Professional Cloud DevOps Engineer certification

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781839218019

Length 482 pages

Edition 1st Edition

Tools

Kubernetes

Concepts

DevOps

Author (1):

Sandeep Madamanchi

View More author details

Table of Contents (17) Chapters

Preface

1. Section 1: Site Reliability Engineering – A Prescriptive Way to Implement DevOps

2. Chapter 1: DevOps, SRE, and Google Cloud Services for CI/CD FREE CHAPTER

3. Chapter 2: SRE Technical Practices – Deep Dive

4. Chapter 3: Understanding Monitoring and Alerting to Target Reliability

5. Chapter 4: Building SRE Teams and Applying Cultural Practices

6. Section 2: Google Cloud Services to Implement DevOps via CI/CD

7. Chapter 5: Managing Source Code Using Cloud Source Repositories

8. Chapter 6: Building Code Using Cloud Build, and Pushing to Container Registry

9. Chapter 7: Understanding Kubernetes Essentials to Deploy Containerized Applications

10. Chapter 8: Understanding GKE Essentials to Deploy Containerized Applications

11. Chapter 9: Securing the Cluster Using GKE Security Constructs

12. Chapter 10: Exploring GCP Cloud Operations

13. Mock Exam 1

Answers

14. Mock Exam 2

15. Other Books You May Enjoy

Appendix: Getting Ready for Professional Cloud DevOps Engineer Certification

Incident management

Incident management is one of the key roles of an SRE engineer. An incident is defined as an event that indicates the possibility of an issue with respect to a service or an application. The nature of the issue can be minor in nature in the best case or, in contrast, can be an outage in the worst case. An incident can be triggered by an alert that was set up as part of monitoring the service or application.

An alert is an indication that SLO objectives with respect to the service are being violated or are on track to be violated. Sometimes, and specifically for an external-facing application, an incident can be triggered by an end user complaining via social media platforms. Such incidents include an additional layer of retrospection on how or why the current alerting system put in place failed to identify the incident.

Effective incident management is a critical SRE cultural practice that is key to limiting the disruption caused by an incident and is critical...