You're reading from Observability with Grafana Monitor, control, and visualize your Kubernetes and cloud platforms using the LGTM stack

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781803248004

Length 356 pages

Edition 1st Edition

Languages

Python

Tools

Grafana

Concepts

DevOps

Authors (2):

Rob Chapman

Peter Holmes

View More author details

Table of Contents (22) Chapters

Preface

1. Part 1: Get Started with Grafana and Observability

2. Chapter 1: Introducing Observability and the Grafana Stack FREE CHAPTER

3. Chapter 2: Instrumenting Applications and Infrastructure

4. Chapter 3: Setting Up a Learning Environment with Demo Applications

5. Part 2: Implement Telemetry in Grafana

6. Chapter 4: Looking at Logs with Grafana Loki

7. Chapter 5: Monitoring with Metrics Using Grafana Mimir and Prometheus

8. Chapter 6: Tracing Technicalities with Grafana Tempo

9. Chapter 7: Interrogating Infrastructure with Kubernetes, AWS, GCP, and Azure

10. Part 3: Grafana in Practice

11. Chapter 8: Displaying Data with Dashboards

12. Chapter 9: Managing Incidents Using Alerts

13. Chapter 10: Automation with Infrastructure as Code

14. Chapter 11: Architecting an Observability Platform

15. Part 4: Advanced Applications and Best Practices of Grafana

16. Chapter 12: Real User Monitoring with Grafana

17. Chapter 13: Application Performance with Grafana Pyroscope and k6

18. Chapter 14: Supporting DevOps Processes with Observability

19. Chapter 15: Troubleshooting, Implementing Best Practices, and More with Grafana

20. Index

Why subscribe?

21. Other Books You May Enjoy

Observability in a nutshell

The term observability is borrowed from control theory. It’s common to use the term interchangeably with the term monitoring in IT systems, as the concepts are closely related. Monitoring is the ability to raise an alarm when something is wrong, while observability is the ability to understand a system and determine whether something is wrong, and why.

Control theory was formalized in the late 1800s on the topic of centrifugal governors in steam engines. This diagram shows a simplified view of such a system:

Figure 1.1 – James Watt’s steam engine flyweight governor (source: https://www.mpoweruk.com)

Steam engines use a boiler to boil water in a pressure vessel. The steam pushes a piston backward and forward, which converts heat energy to reciprocal energy. In steam engines that use centrifugal governors, this reciprocal energy is converted to rotational energy via a wheel connected to a piston. The centrifugal governor provides a physical link backward through the system to the throttle. This means that the speed of rotation controls the throttle, which, in turn, controls the speed of rotation. Physically, this is observed by the balls on the governor flying outward and dropping inward until the system reaches equilibrium.

Monitoring defines the metrics or events that are of interest in advance. For instance, the governor measures the pre-defined metric of drive shaft revolutions. The controllability of the throttle is then provided by the pivot and actuator rod assembly. Assuming the actuator rod is adjusted correctly, the governor should control the throttle from fully open to fully closed.

In contrast, observability is achieved by allowing the internal state of the system to be inferred from its external outputs. If the operating point adjustment is incorrectly set, the governor may spin too fast or too slowly, rendering the throttle control ineffective. A governor spinning too fast or too slowly could also indicate that the sliding ring is stuck in place and needs oiling. Importantly, this insight can be gained without defining in advance what too fast or too slow means. The insight that the governor is spinning too fast or too slowly also needs very little knowledge of the full steam engine.

Fundamentally, both monitoring and observability are used to improve the reliability and performance of the system in question.

Now that we have introduced the high-level concepts, let’s explore a practical example outside of the world of software services.

Case study – A ship passing through the Panama Canal

Let’s imagine a ship traversing the Agua Clara locks on the Panama Canal. This can be illustrated using the following figure:

Figure 1.2 – The Agua Clara locks on the Panama Canal

There are a few aspects of these locks that we might want to monitor:

The successful opening and closing of each gate
The water level inside each lock
How long it takes for a ship to traverse the locks

Monitoring these aspects may highlight situations that we need to be alerted about:

A gate is stuck open because of a mechanical failure
The water level is rapidly descending because of a leak
A ship is taking too long to exit the locks because it is stuck

There may be situations where the data we are monitoring are within acceptable limits, but we can still observe a deviation from what is considered normal, which should prompt further action:

A small leak has formed near the top of the lock wall:
- We would see the water level drop but only when it is above the leak
- This could prompt maintenance work on the lock wall
A gate in one lock is opening more slowly because it needs maintenance:
- We would see the time between opening and closing the gate increase
- This could prompt maintenance on the lock gate
Ships take longer to traverse the locks when the wind is coming from a particular direction:
- We could compare hourly average traversal rates
- This could prompt work to reduce the impact of wind from one direction

Now that we’ve seen an example of measuring a real-world system, we can group these types of measurements into different data types to best suit the application. Let’s introduce those now.

You're reading from Observability with Grafana Monitor, control, and visualize your Kubernetes and cloud platforms using the LGTM stack

Table of Contents (22) Chapters

Observability in a nutshell

Case study – A ship passing through the Panama Canal

Authors (2)

Personalised recommendations for you