You're reading from Modern Distributed Tracing in .NET A practical guide to observability and performance analysis for microservices

Product type Paperback

Published in Jun 2023

Publisher Packt

ISBN-13 9781837636136

Length 336 pages

Edition 1st Edition

Tools

.NET

Concepts

Application Development

Author (1):

Liudmila Molkova

View More author details

Table of Contents (23) Chapters

Preface

1. Part 1: Introducing Distributed Tracing

2. Chapter 1: Observability Needs of Modern Applications FREE CHAPTER

3. Chapter 2: Native Monitoring in .NET

4. Chapter 3: The .NET Observability Ecosystem

5. Chapter 4: Low-Level Performance Analysis with Diagnostic Tools

6. Part 2: Instrumenting .NET Applications

7. Chapter 5: Configuration and Control Plane

8. Chapter 6: Tracing Your Code

9. Chapter 7: Adding Custom Metrics

10. Chapter 8: Writing Structured and Correlated Logs

11. Part 3: Observability for Common Cloud Scenarios

12. Chapter 9: Best Practices

13. Chapter 10: Tracing Network Calls

14. Chapter 11: Instrumenting Messaging Scenarios

15. Chapter 12: Instrumenting Database Calls

16. Part 4: Implementing Distributed Tracing in Your Organization

17. Chapter 13: Driving Change

18. Chapter 14: Creating Your Own Conventions

19. Chapter 15: Instrumenting Brownfield Applications

20. Assessments

21. Index

Why subscribe?

22. Other Books You May Enjoy

Reviewing context propagation

Correlation and causation are the foundation of distributed tracing. We’ve just covered how related spans share the same trace-id and have a pointer to the parent recorded in parent-span-id, forming a casual chain of operations. Now, let’s explore how it works in practice.

In-process propagation

Even within a single service, we usually have nested spans. For example, if we trace a request to a REST service that just reads an item from a database, we’d want to see at least two spans – one for an incoming HTTP request and another for a database query. To correlate them properly, we need to pass span context from ASP.NET Core to the database driver.

One option is to pass context explicitly as a function argument. It’s a viable solution in Go, where explicit context propagation is a standard, but in .NET, it would make onboarding onto distributed tracing difficult and would ruin the auto-instrumentation magic.

.NET Activity (aka the span) is propagated implicitly. Current activity can always be accessed via the Activity.Current property, backed up by System.Threading.AsyncLocal<T>.

Using our previous example of a service reading from the database, ASP.NET Core creates an Activity for the incoming request, and it becomes current for anything that happens within the scope of this request. Instrumentation for the database driver creates another one that uses Activity.Current as its parent, without knowing anything about ASP.NET Core and without the user application passing the Activity around. The logging framework would stamp trace-id and span-id from Activity.Current, if configured to do so.

It works for sync or async code, but if you process items in the background using in-memory queues or manipulate with threads explicitly, you would have to help runtime and propagate activities explicitly. We’ll talk more about it in Chapter 6, Tracing Your Code.

Out-of-process propagation

In-process correlation is awesome, and for monolith applications, it would be almost sufficient. But in the microservice world, we need to trace requests end to end and, therefore, propagate context over the wire, and here’s where standards come into play.

You can find multiple practices in this space – every complex system used to support something custom, such as x-correlation-id or x-request-id. You can find x-cloud-trace-context or grpc-trace-bin in old Google systems, X-Amzn-Trace-Id on AWS, and Request-Id variations and ms-cv in the Microsoft ecosystem. Assuming your system is heterogeneous and uses a variety of cloud providers and tracing tools, correlation is difficult.

Trace context (which you can explore in more detail at https://www.w3.org/TR/trace-context) is a relatively new standard, converting context propagation over HTTP, but it’s widely adopted and used by default in OpenTelemetry and .NET.

W3C Trace Context

The trace context standard defines traceparent and tracestate HTTP headers and the format to populate context on them.

The traceparent header

The traceparent is an HTTP request header that carries the protocol version, trace-id, parent-id, and trace-flags in the following format:

traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}

version: The protocol version – only 00 is defined at the moment.
trace-id: The logical end-to-end operation ID.
parent-id: Identifies the client span and serves as a parent for the corresponding server span.
trace-flags: Represents the sampling decision (which we’ll talk about in Chapter 5, Configuration and Control Plane). For now, we can determine that 00 indicates that the parent span was sampled out and 01 means it was sampled in.

All identifiers must be present – that is, traceparent has a fixed length and is easy to parse. Figure 1.8 shows an example of context propagation with the traceparent header:

Figure 1.8 – traceparent is populated from the outgoing span context and becomes a parent for the incoming span

Note

The protocol does not require creating spans and does not specify instrumentation points. Common practice is to create spans per outgoing and incoming requests, and put client span context into request headers.

The tracestate header

The tracestate is another request header, which carries additional context for the tracing tool to use. It’s designed for OpenTelemetry or an APM tool to carry additional control information and not for application-specific context (covered in detail later in the Baggage section).

The tracestate consists of a list of key-value pairs, serialized to a string with the following format: "vendor1=value1,vendor2=value2".

The tracestate can be used to propagate incompatible legacy correlation IDs, or some additional identifiers vendor needs.

OpenTelemetry, for example, uses it to carry a sampling probability and score. For example, tracestate: "ot=r:3;p:2" represents a key-value pair, where the key is ot (OpenTelemetry tag) and the value is r:3;p:2.

The tracestate header has a soft limitation on size (512 characters) and can be truncated.

The traceresponse (draft) header

Unlike traceparent and tracestate, traceresponse is a response header. At the time of writing, it’s defined in W3C Trace-Context Level 2 (https://www.w3.org/TR/trace-context-2/) and has reached W3C Editor’s Draft status. There is no support for it in .NET or OpenTelemetry.

traceresponse is very similar to traceparent. It has the same format, but instead of client-side identifiers, it returns the trace-id and span-id values of the server span:

traceresponse: 00-{trace-id}-{child-id}-{trace-flags}

traceresponse is optional in the sense that the server does not need to return it, even if it supports W3C Trace-Context Level 2. It’s useful to return traceresponse when the client did not pass a valid traceparent, but can log traceresponse.

External-facing services may decide to start a new trace, because they don’t trust the caller’s trace-id generation algorithm. Uniform random distribution is one concern; another reason could be a special trace-id format. If the service restarts a trace, it’s a good idea to return the traceresponse header to caller.

B3

The B3 specification (https://github.com/openzipkin/b3-propagation) was adopted by Zipkin – one of the first distributed tracing systems.

B3 identifiers can be propagated as a single b3 header in the following format:

b3: {trace-id}-{span-id}-{sampling-state}-{parent-span-id}

Another way is to pass individual components, using X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, and X-B3-Sampled.

The sampling state suggests whether a service should trace the corresponding request. In addition to 0 (don’t record) and 1 (do record), it allows us to force tracing with a flag set to d. It’s usually done for debugging purposes. The sampling state can be passed without other identifiers to specify the desired sampling decision to the service.

Note

The key difference with W3C Trace-Context, beyond header names, is the presence of both span-id and parent-span-id. B3 systems can use the same span-id on the client and server sides, creating a single span for both.

Zipkin reuses span-id from the incoming request, also specifying parent-span-id on it. The Zipkin span represents the client and server at the same time, as shown in Figure 1.9, recording different durations and statuses for them:

Figure 1.9 – Zipkin creates one span to represent the client and server

OpenTelemetry and .NET support b3 headers but ignore parent-span-id – they generate a new span-id for every span, as it’s not possible to reuse span-id (see Figure 1.10).

Figure 1.10 – OpenTelemetry does not use parent-span-id from B3 headers and creates different spans for the client and server

Baggage

So far, we have talked about span context and correlation. But in many cases, distributed systems have application-specific context. For example, you authorize users on your frontend service, and after that, user-id is not needed for application logic, but you still want to add it as an attribute on spans from all services to query and aggregate it on a per-user basis.

You can stamp user-id once on the frontend. Then, spans recorded on the backend will not have user-id, but they will share the same trace-id as the frontend. So, with some joins in your queries, you can still do per-user analysis. It works to some extent but may be expensive or slow, so you might decide to propagate user-id and stamp it on the backend spans too.

The baggage (https://www.w3.org/TR/baggage/) defines a generic propagation format for distributed context, and you can use it for business logic or anything else by adding, reading, removing, and modifying baggage members. For example, you can route requests to the test environment and pass feature flags or extra telemetry context.

Baggage consists of a list of semicolon-separated members. Each member has a key, value, and optional properties in the following formats – key=value;property1;key2=property2 or key=value;property1;key2=property2,anotherKey=anotherValue.

OpenTelemetry and .NET only propagate baggage, but don’t stamp it on any telemetry. You can configure ILogger to stamp baggage and need to enrich traces explicitly. We’ll see how it works in Chapter 5, Configuration and Control Plane.

Tip

You should not put any sensitive information in baggage, as it’s almost impossible to guarantee where it would flow – your application or sidecar infrastructure can forward it to your cloud provider or anywhere else.

Maintain a list of well-known baggage keys across your system and only use known ones, as you might receive baggage from another system otherwise.

Baggage specification has a working draft status and may still change.

Note

While the W3C Trace Context standard is HTTP-specific and B3 applies to any RPC calls, they are commonly used for any context propagation needs – for example, they are passed as the event payload in messaging scenarios. This may change once protocol-specific standards are introduced.

You're reading from Modern Distributed Tracing in .NET A practical guide to observability and performance analysis for microservices

Table of Contents (23) Chapters

Reviewing context propagation

In-process propagation

Out-of-process propagation

W3C Trace Context

The traceparent header

The tracestate header

The traceresponse (draft) header

B3

Baggage

Authors (1)

Personalised recommendations for you