Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Cloud-Native Observability with OpenTelemetry

You're reading from   Cloud-Native Observability with OpenTelemetry Learn to gain visibility into systems by combining tracing, metrics, and logging with OpenTelemetry

Arrow left icon
Product type Paperback
Published in May 2022
Publisher Packt
ISBN-13 9781801077705
Length 386 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Alex Boten Alex Boten
Author Profile Icon Alex Boten
Alex Boten
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1: The Basics
2. Chapter 1: The History and Concepts of Observability FREE CHAPTER 3. Chapter 2: OpenTelemetry Signals – Traces, Metrics, and Logs 4. Chapter 3: Auto-Instrumentation 5. Section 2: Instrumenting an Application
6. Chapter 4: Distributed Tracing – Tracing Code Execution 7. Chapter 5: Metrics – Recording Measurements 8. Chapter 6: Logging – Capturing Events 9. Chapter 7: Instrumentation Libraries 10. Section 3: Using Telemetry Data
11. Chapter 8: OpenTelemetry Collector 12. Chapter 9: Deploying the Collector 13. Chapter 10: Configuring Backends 14. Chapter 11: Diagnosing Problems 15. Chapter 12: Sampling 16. Other Books You May Enjoy

Reviewing the history of observability

In many ways, being able to understand what a computer is doing is both fun and challenging when working with software. The ability to understand how systems are behaving has gone through quite a few iterations since the early 2000s. Many different markets have been created to solve this need, such as systems monitoring, log management, and application performance monitoring. As is often the case, when new challenges come knocking, the doors of opportunity open to those willing to tackle those challenges. Over the same period, countless vendors and open source projects have sprung up to help people who are building and operating services in managing their systems. The term observability, however, is a recent addition to the software industry and comes from control theory.

Wikipedia (https://en.wikipedia.org/wiki/Observability) defines observability as:

"In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs."

Observability is an evolution of its predecessors, built on lessons learned through years of experience and trial and error. To better understand where observability is today, it's important to understand where some of the methods used today by cloud-native application developers come from, and how they have changed over time. We'll start by looking at the following:

  • Centralized logging
  • Metrics and dashboards
  • Tracing and analysis

Centralized logging

One of the first pieces of software a programmer writes when learning a new language is a form of observability: "Hello, World!". Printing some text to the terminal is usually one of the quickest ways to provide users with feedback that things are working, and that's why "Hello, World" has been a tradition in computing since the late 1960s.

One of my favorite methods for debugging is still to add print statements across the code when things aren't working. I've even used this method to troubleshoot an application distributed across multiple servers before, although I can't say it was my proudest moment, as it caused one of our services to go down temporarily because of a typo in an unfamiliar editor. Print statements are great for simple debugging, but unfortunately, this only scales so far.

Once an application is large enough or distributed across enough systems, searching through the logs on individual machines is not practical. Applications can also run on ephemeral machines that may no longer be present when we need those logs. Combined, all of this created a need to make the logs available in a central location for persistent storage and searchability, and thus centralized logging was born.

There are many available vendors that provide a destination for logs, as well as features around searching, and alerting based on those logs. There are also many open source projects that have tried to tackle the challenges of standardizing log formats, providing mechanisms for transport, and storing the logs. The following are some of these projects:

  • Fluentdhttps://www.fluentd.org
  • Logstashhttps://github.com/elastic/logstash
  • Apache Flumehttps://flume.apache.org

Centralized logging additionally provides the opportunity to produce metrics about the data across the entire system.

Using metrics and dashboards

Metrics are possibly the most well-known of the tools available in the observability space. Think of the temperature in a thermometer, the speed on the odometer of a car, or the time on a watch. We humans love measuring and quantifying things. From the early days of computing, being able to keep track of how resources were utilized was critical in ensuring that multi-user environments provided a good user experience for all users of the system.

Nowadays, measuring application and system performance via the collection of metrics is common practice in software development. This data is converted into graphs to generate meaningful visualizations for those in charge of monitoring the health of a system.

These metrics can also be used to configure alerting when certain thresholds have been reached, such as when an error rate becomes greater than an acceptable percentage. In certain environments, metrics are used to automate workflows as a reaction to changes in the system, such as increasing the number of application instances or rolling back a bad deployment. As with logging, over time, many vendors and projects provided their own solutions to metrics, dashboards, monitoring, and alerting. Some of the open source projects that focus on metrics are as follows:

  • Prometheushttps://prometheus.io
  • StatsDhttps://github.com/statsd/statsd
  • Graphitehttps://graphiteapp.org
  • Grafanahttps://github.com/grafana/grafana

Let's now look at tracing and analysis.

Applying tracing and analysis

Tracing applications means having the ability to run through the application code and ensure it's doing what is expected. This can often, but not always, be achieved in development using a debugger such as GDB (https://www.gnu.org/software/gdb/) or PDB (https://docs.python.org/3/library/pdb.html) in Python. This becomes impossible when debugging an application that is spread across multiple services on different hosts across a network. Researchers at Google published a white paper on a large-scale distributed tracing system built internally: Dapper (https://research.google/pubs/pub36356/). In this paper, they describe the challenges of distributed systems, as well as the approach that was taken to address the problem. This research is the basis of distributed tracing as it exists today. After the paper was published, several open source projects sprung up to provide users with the tools to trace and visualize applications using distributed tracing:

  • OpenTracinghttps://opentracing.io
  • OpenCensushttps://opencensus.io
  • Zipkinhttps://zipkin.io
  • Jaegerhttps://www.jaegertracing.io

As you can imagine, with so many tools, it can be daunting to even know where to begin on the journey to making a system observable. Users and organizations must spend time and effort upfront to even get started. This can be challenging when other deadlines are looming. Not only that, but the time investment needed to instrument an application can be significant depending on the complexity of the application, and the return on that investment sometimes isn't made clear until much later. The time and money invested, as well as the expertise required, can make it difficult to change from one tool to another if the initial implementation no longer fits your needs as the system evolves.

Such a wide array of methods, tools, libraries, and standards has also caused fragmentation in the industry and the open source community. This has led to libraries supporting one format or another. This leaves it up to the user to fix any gaps within the environments themselves. This also means there is effort required to maintain feature parity across different projects. All of this could be addressed by bringing the people working in these communities together.

With a better understanding of different tools at the disposal of application developers, their evolution, and their role, we can start to better appreciate the scope of what OpenTelemetry is trying to solve.

You have been reading a chapter from
Cloud-Native Observability with OpenTelemetry
Published in: May 2022
Publisher: Packt
ISBN-13: 9781801077705
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image