You're reading from AWS Certified DevOps Engineer - Professional Certification and Beyond Pass the DOP-C01 exam and prepare for the real world using case studies and real-life examples

Product type Paperback

Published in Nov 2021

Publisher Packt

ISBN-13 9781801074452

Length 638 pages

Edition 1st Edition

Tools

AWS

Concepts

DevOps

Author (1):

Adam Book

View More author details

Table of Contents (31) Chapters

Preface

1. Section 1: Establishing the Fundamentals

2. Chapter 1: Amazon Web Service Pillars FREE CHAPTER

3. Chapter 2: Fundamental AWS Services

4. Chapter 3: Identity and Access Management and Working with Secrets in AWS

5. Chapter 4: Amazon S3 Blob Storage

6. Chapter 5: Amazon DynamoDB

7. Section 2: Developing, Deploying, and Using Infrastructure as Code

8. Chapter 6: Understanding CI/CD and the SDLC

9. Chapter 7: Using CloudFormation Templates to Deploy Workloads

10. Chapter 8: Creating Workloads with CodeCommit and CodeBuild

11. Chapter 9: Deploying Workloads with CodeDeploy and CodePipeline

12. Chapter 10: Using AWS Opsworks to Manage and Deploy your Application Stack

13. Chapter 11: Using Elastic Beanstalk to Deploy your Application

14. Chapter 12: Lambda Deployments and Versioning

15. Chapter 13: Blue Green Deployments

16. Section 3: Monitoring and Logging Your Environment and Workloads

17. Chapter 14: CloudWatch and X-Ray's Role in DevOps

18. Chapter 15: CloudWatch Metrics and Amazon EventBridge

19. Chapter 16: Various Logs Generated (VPC Flow Logs, Load Balancer Logs, CloudTrail Logs)

20. Chapter 17: Advanced and Enterprise Logging Scenarios

21. Section 4: Enabling Highly Available Workloads, Fault Tolerance, and Implementing Standards and Policies

22. Chapter 18: Autoscaling and Lifecycle Hooks

23. Chapter 19: Protecting Data in Flight and at Rest

24. Chapter 20: Enforcing Standards and Compliance with System Manger's Role and AWS Config

25. Chapter 21: Using Amazon Inspector to Check your Environment

26. Chapter 22: Other Policy and Standards Services to Know

27. Section 5: Exam Tips and Tricks

28. Chapter 23: Overview of the DevOps Professional Certification Test

29. Chapter 24: Practice Exam 1

30. Other Books You May Enjoy

Reliability

There are five design principles for reliability in the cloud:

Automating recover from failure
Testing recovery procedures
Scaling horizontally to increase aggregate workload availability
Stopping guessing capacity
Managing changes in automation

Automating recovery from failure

When you think of automating recovery from failure, the first thing most people think of is a technology solution. However, this is not necessarily the context that is being referred to in the reliability service pillar. These points of failure really should be based on Key Performance Indicators (KPIs) set by the business.

As part of the recovery process, it's important to know both the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) of the organization or workload:

RTO (Recovery Time Objective): The maximum acceptable delay between the service being interrupted and restored
RPO (Recovery Point Objective): The maximum acceptable amount of time since the last data recovery point (backup) (Amazon Web Services, 2021)

Testing recovery procedures

In your cloud environment, you should not only test your workload functions properly, but also that they can recover from single or multiple component failures if they happen on a service, Availability Zone, or regional level.

Using practices such as Infrastructure as Code, CD pipelines, and regional backups, you can quickly spin up an entirely new environment. This could include your application and infrastructure layers, which will give you the ability to test that things work the same as in the current production environment and that data is restored correctly. You can also time how long the restoration takes and work to improve it by automating the recovery time.

Taking the proactive measure of documenting each of the necessary steps in a runbook or playbook allows for knowledge sharing, as well as fewer dependencies on specific team members who built the systems and processes.

Scaling horizontally to increase workload availability

When coming from a data center environment, planning for peak capacity means finding a machine that can run all the different components of your application. Once you hit the maximum resources for that machine, you need to move to a bigger machine.

As you move from development to production or as your product or service grows in popularity, you will need to scale out your resources. There are two main methods for achieving this: scaling vertically or scaling horizontally:

Figure 1.4 – Horizontal versus vertical scaling

One of the main issues with scaling vertically is that you will hit the ceiling at some point in time, moving to larger and larger instances. At some point, you will find that there is no longer a bigger instance to move up to, or that the larger instance will be too cost-prohibitive to run.

Scaling horizontally, on the other hand, allows you to gain the capacity that you need at the time in a cost-effective manner.

Moving to a cloud mindset means decoupling your application components, placing multiple groupings of the same servers behind a load balancer, or pulling from a queue and optimally scaling up and down based on the current demand.

Stop guessing capacity

If resources become overwhelmed, then they have a tendency to fail, especially on-premises, as demands spike and those resources don't have the ability to scale up or out to meet demand.

There are service limits to be aware of, though many of them are called soft limits. These can be raised with a simple request or phone call to support. There are others called hard limits. They are set a specified number for every account, and there is no raising them.

Note

Although there isn't a necessity to memorize all these limitations, it is a good idea to become familiar with them and know about some of them since they do show up in some of the test questions – not as pure questions, but as context for the scenarios.

Managing changes in automation

Although it may seem easier and sometimes quicker to make a change to the infrastructure (or application) by hand, this can lead to infrastructure drift and is not a repeatable process. A best practice is to automate all changes using Infrastructure as Code, a code versioning system, and a deployment pipeline. This way, the changes can be tracked and reviewed.

You're reading from AWS Certified DevOps Engineer - Professional Certification and Beyond Pass the DOP-C01 exam and prepare for the real world using case studies and real-life examples

Table of Contents (31) Chapters

Reliability

Automating recovery from failure

Testing recovery procedures

Scaling horizontally to increase workload availability

Stop guessing capacity

Managing changes in automation

Authors (1)

Personalised recommendations for you