You're reading from Implementing Enterprise Observability for Success Strategically plan and implement observability using real-life examples

Product type Paperback

Published in Jun 2023

Publisher Packt

ISBN-13 9781804615690

Length 164 pages

Edition 1st Edition

Concepts

DevOps

Authors (2):

Karun Krishnannair

Manisha Agrawal

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1 – Understanding Observability in the Real World

2. Chapter 1: Why Observe? FREE CHAPTER

3. Chapter 2: The Fundamentals of Observability

4. Chapter 3: The Real World and Its Challenges

5. Chapter 4: Collecting Data to Set Up Observability

6. Chapter 5: Observability Outcomes: Dashboards, Alerts, and Incidents

7. Part 2 – Planning and Implementation

8. Chapter 6: Gauging the Organization for Observability Implementation

9. Chapter 7: Achieving and Measuring Observability Success

10. Chapter 8: Identifying the Stakeholders

11. Chapter 9: Deciding the Tools for Observability

12. Part 3 – Use Cases

13. Chapter 10: Kickstarting Your Own Observability Journey

14. Index

Why subscribe?

15. Other Books You May Enjoy

Issues with traditional monitoring techniques

Traditional monitoring techniques focused on collecting and analyzing a few predefined metrics and leveraging them to analyze the system’s health and use them for alerting. IT systems were managed and operated in isolation and all the IT management and engineering processes in an organization were framed around this construct and followed this isolated approach. Many IT system providers created monitoring tools to primarily monitor the application’s health in isolation.

Let’s discuss the issues with traditional monitoring techniques and why they no longer fit the bill for observability implementations. You may already be past these challenges, but we still recommend reading through each of the challenges as we talk about them from an observability perspective.

Modern infrastructure

Let’s consider a service that depends on three applications. The traditional approach Would have identified key parameters that define the health of each of these three individual applications. Each of these services will be monitored separately, assuming that if the applications are healthy individually then the business service that depends on these applications (fully or partially) would also be healthy and will serve the customers efficiently. There was no concept of service in this approach.

This method would have worked well for a traditional infrastructure, where the application was monolithic and hosted on physical hardware in data centers. This guaranteed a certain amount of resources for the application to run. Then came virtualization, which added another layer on top of the physical hardware, and the guarantee of dedicated resources was gone. The adoption of cloud infrastructure services such as AWS and GCP and cloud-native technologies such as serverless architecture, microservices, and containers have completely de-coupled infrastructure and applications, making the IT system more complex and interdependent. These technologies have introduced a level of unpredictability in IT systems’ operations. Hence the concepts, practices, and tools used for managing and maintaining the health of applications also have to change accordingly.

Pre-empting issues

One of the key issues with the traditional monitoring approach is that you pre-empt the metrics that need to be collected and monitored. Many of these key indicators or metrics are decided based on the past experiences of vendors, administrators, and system engineers. With more experience, engineers can come up with multiple and better key indicators. While this was effective to a certain extent in traditional infrastructure environments, modern distributed architecture has introduced a lot of interdependencies and complexity in IT environments, where the source of the problems or issues can drastically vary. Hence, pre-empting potential health indicators or metrics can be quite inaccurate and challenging.

Identifying why and where the problem exists

The main purpose of conventional monitoring is to detect when there is a problem. This provides a simple green, amber, or red health status indication but doesn’t answer why and where the issue originates. Once the issues have been flagged, it’s up to the administrators and engineers to figure out where and why the problem exists. Since modern infrastructure services are very transient, identifying the source of the problem is quite difficult or time-consuming. Hence, answering why and where as quickly as possible is critical in reducing MTTR and maintaining a stable service.