What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Building Low Latency Applications with C++

Introducing Low Latency Application Development in C++

Let us kick off our journey with low latency applications by introducing them in this first chapter. In this chapter, we will first understand the behavior and requirements of latency-sensitive and latency-critical applications. We will understand the huge business impact that application latencies have for businesses that rely on quick and strict response times.

We will also discuss why C++ is one of the most preferred programming languages when it comes to low latency application development. We will spend a large part of this book building an entire low latency electronic trading system from scratch in C++. So, this will serve as a good chapter for you to understand the motivation for using C++ as well as what makes it the most popular language for low latency applications.

We will also present some of the important low latency applications in different business areas. Part of the motivation is to make you understand that latencies are indeed very critical in different business areas for use cases that are sensitive to response times. The other part of the motivation is to identify the similarities in the behavior, expectations, design, and implementation of these applications. Even though they solve different business problems, the low latency requirements of these applications are often built on similar technical design and implementation principles.

In this chapter, we will cover the following topics:

Understanding the requirements for latency-sensitive applications
Understanding why C++ is the preferred programming language
Introducing some important low latency applications

In order to build ultra-low latency applications effectively, we should first understand the terms and concepts we will refer to throughout the rest of this book. We should also understand why C++ has emerged as the clear choice for most low latency application development. It is also important to always keep the business impact of low latencies in mind because the aim is to build low latency applications to benefit the business’s bottom line. This chapter discusses these ideas so that you can build a good foundation before we dive into the technical details in the rest of this book.

Understanding requirements for latency-sensitive applications

In this section, we will discuss some concepts that are required to build an understanding of what metrics matter for latency-sensitive applications. First, let’s define clearly what latency means and what latency-sensitive applications are.

Latency is defined as the time delay between when a task is started to the time when the task is finished. By definition, any processing or work will incur some overhead or latency – that is, no system has zero latency unless the system does absolutely no work. The important detail here is that some systems might have latency that is an infinitesimal fraction of a millisecond and the tolerance for an additional microsecond there might be low.

Low latency applications are applications that execute tasks and respond or return results as quickly as possible. The point here is that reaction latency is an important criterion for such applications where higher latencies can degrade performance or even render an application completely useless. On the other hand, when such applications perform with the low latencies that are expected of them, they can beat the competition, run at maximum speed, achieve maximum throughput, or increase productivity and improve the user experience – depending on the application and business.

Low latency can be thought of as both a quantitative as well as a qualitative term. The quantitative aspect is pretty obvious, but the qualitative aspect might not necessarily be obvious. Depending on the context, architects and developers might be willing to accept higher latencies in some cases but be unwilling to accept an extra microsecond in some contexts. For instance, if a user refreshes a web page or they wait for a video to load, a few seconds of latency is quite acceptable. However, once the video loads and starts playing, it can no longer incur a few seconds of latency to render or display without negatively impacting the user experience. An extreme example is high-speed financial trading systems where a few extra microseconds can make a huge difference between a profitable firm and a firm that cannot compete at all.

In the following subsections, we will present some nomenclature that applies to low latency applications. It is important to understand these terms well so that we can continue our discussion on low latency applications, as we will refer to these concepts frequently. The concepts and terms we will discuss next are used to differentiate between different latency-sensitive applications, the measurement of latencies, and the requirements of these applications.

Understanding latency-sensitive versus latency-critical applications

There is a subtle but important difference between the terms latency-sensitive applications and latency-critical applications. A latency-sensitive application is one in which, as performance latencies are reduced, it improves the business impact or profitability. So, the system might still be functional and possibly profitable at higher performance latencies but can be significantly more profitable if latencies are reduced. Examples of such applications would be operating systems (OSes), web browsers, databases, and so on.

A latency-critical application, on the other hand, is one that fails completely if performance latency is higher than a certain threshold. The point here is that while latency-sensitive applications might only lose part of their profitability at higher latencies, latency-critical applications fail entirely at high enough latencies. Examples of such applications are traffic control systems, financial trading systems, autonomous vehicles, and some medical appliances.

Measuring latency

In this section, we will discuss different methods of measuring latency. The real difference between these methods comes down to what is considered the beginning of the processing task and what is the end of the processing task. Another approach would be the units of what we are measuring – time is the most common one but in some cases, CPU clock cycles can also be used if it comes down to instruction-level measurements. Let’s look at the different measurements next, but first, we present a diagram of a generic server-client system without diving into the specifics of the use case or transport protocols. This is because measuring latency is generic and applies to many different applications with this kind of server-client setup.

Figure 1.1 – A general server-client system with timestamps between different hops

We present this diagram here because, in the next few subsections, we will define and understand latencies between the different hops on the round-trip path from the server client and back to the server.

Time to first byte

Time to first byte is measured as the time elapsed from when the sender sends the first byte of a request (or response) to the moment when the receiver receives the first byte. This typically (but not necessarily) applies to network links or systems where there are data transfer operations that are latency-sensitive. In Figure 1.1, time to first byte would be the difference between and

Round-trip time

Round-trip time (RTT) is the sum of the time it takes for a packet to travel from one process to another and then the time it takes for the response packet to reach the original process. Again, this is typically (but not necessarily) used for network traffic going back and forth between server and client processes, but can also be used for two processes communicating in general.

RTT, by default, includes the time taken by the server process to read, process, and respond to the request sent by the sender – that is, RTT generally includes server processing times. In the context of electronic trading, the true RTT latency is based on three components:

First, the time it takes for information from the exchange to reach the participant
Second, the time it takes for the execution of the algorithms to analyze the information and make a decision
Finally, the time it take for the decision to reach the exchange and get processed by the matching engine

We will discuss this more in the last section of this book, Analyzing and improving performance.

Tick-to-trade

Tick-to-trade (TTT) is similar to RTT and is a term most commonly used in electronic trading systems. TTT is defined as the time from when a packet (usually a market data packet) first hits a participant’s infrastructure (trading server) to the time when the participant is done processing the packet and sends a packet out (order request) to the trading exchange. So, TTT includes the time spent by the trading infrastructure to read the packet, process it, calculate trading signals, generate an order request in reaction to that, and put it on the wire. Putting it on the wire typically means writing something to a network socket. We will revisit this topic and explore it in greater detail in the last section of this book, Analyzing and improving performance. In Figure 1.1, TTT would be the difference between and .

CPU clock cycles

CPU clock cycles are basically the smallest increment of work that can be done by the CPU processor. In reality, they are the amount of time between two pulses of the oscillator that drives the CPU processor. Measuring CPU clock cycles is typically used to measure latency at the instruction level – that is, at an extremely low level at the processor level. C++ is both a low-level as well as a high-level language; it lets you get as close to the hardware as needed and also provides higher-level abstractions such as classes, templates, and so on. But generally, C++ developers do not spend a lot of time dealing with extremely low-level or possibly assembly code. This means that the compiled machine code might not be exactly what a C++ developer expects. Additionally, depending on the compiler versions, the processor architectures, and so on, there may be even more sources of differences. So, for extremely performance-sensitive low latency code, it is often not uncommon for engineers to measure how many instructions are executed and how many CPU clock cycles are required to do so. This level of optimization is typically the highest level of optimization possible, alongside kernel-level optimizations.

Now that we have seen some different methods of measuring latencies in different applications, in the next section, we will look at some latency summary metrics and how each one of them can be important under different scenarios.

Differentiating between latency metrics

The relative importance of a specific latency metric over the other depends on the application and the business itself. As an example, a latency-critical application such as an autonomous vehicle software system cares about peak latency much more than the mean latency. Low latency electronic trading systems typically care more about mean latency and smaller latency variance than they do about peak latency. Video streaming and playback applications might generally prioritize high throughput over lower latency variance due to the nature of the application and the consumers.

Throughput versus latency

Before we look at the metrics themselves, first, we need to clearly understand the difference between two terms – throughput and latency – which are very similar to each other and often used interchangeably but should not be. Throughput is defined as how much work gets done in a certain period of time, and latency is how quickly a single task is completed. To improve throughput, the usual approach is to introduce parallelism and add additional computing, memory, and networking resources. Note that each individual task might not be processed as quickly as possible, but overall, more tasks will be completed after a certain amount of time. This is because, while being processed individually, each task might take longer than in a low latency setup, but the parallelism boosts throughput over a set of tasks. Latency, on the other hand, is measured for each individual task from beginning to finish, even if fewer tasks are executed overall.

Mean latency

Mean latency is basically the expected average response time of a system. It is simply the average of all the latency measurement observations. This metric includes large outliers, so can be a noisy metric for systems that experience a large range of performance latencies.

Median latency

Median latency is typically a better metric for the expected response time of a system. Since it is the median of the latency measurement observations, it excludes the impact of large outliers. Due to this, it is sometimes preferred over the mean latency metric.

Peak latency

Peak latency is an important metric for systems where a single large outlier in performance can have a devastating impact on the system. Large values of peak latency can also significantly influence the mean latency metric of the system.

Latency variance

For systems that require a latency profile that is as deterministic as possible, the actual variance of the performance latency is an important metric. This is typically important where the expected latencies are quite predictable. For systems with low latency variance, the mean, median, and peak latencies are all expected to be quite close to each other.

Requirements of latency-sensitive applications

In this section, we will formally describe the behavior of latency-sensitive applications and the performance profile that these applications are expected to adhere to. Obviously, latency-sensitive applications need low latency performance, but here we will try to explore minor subtleties in the term low latency and discuss some different ways of looking at it.

Correctness and robustness

When we think of latency-sensitive applications, it is often the case that we think low latency is the single most important aspect of such applications. But in reality, a huge requirement of such applications is correctness and we mean very high levels of robustness and fault tolerance. Intuitively, this idea should make complete sense; these applications require very low latency to be successful, which then should tell you that these applications also have very high throughput and need to process huge amounts of inputs and produce a large number of outputs. Hence, the system needs to achieve very close to 100% correctness and be very robust as well for the application to be successful in their business area. Additionally, the correctness and robustness requirements need to be maintained as the application grows and changes during its lifetime.

Low latencies on average

This is the most obvious requirement when we think about latency-sensitive applications. The expected reaction or processing latency needs to be as low as possible for the application or business overall to succeed. Here, we care about the mean and median performance latency and need it to be as low as possible. By design, this means the system cannot have too many outliers or very high peaks in performance latency.

Capped peak latency

We use the term capped peak latency to refer to the requirement that there needs to be a well-defined upper threshold for the maximum possible latency the application can ever encounter. This behavior is important for all latency-sensitive applications, but most important for latency-critical applications. But even in the general case, applications that have extremely high-performance latency for a handful of cases will typically destroy the performance of the system. What this really means is that the application needs to handle any input, scenario, or sequence of events and do so within a low latency period. Of course, the performance to handle a very rare and specific scenario can possibly be much higher than the most likely case, but the point here is that it cannot be unbounded or unacceptable.

Predictable latency – low latency variance

Some applications prefer that the expected performance latency is predictable, even if that means sacrificing latency a little bit if the average latency metric is higher than it could be. What this really means is that such applications will make sure that the expected performance latency for all kinds of different inputs or events has as little variance as possible. It is impossible to achieve zero latency variance, but some choices can be made in data structures, algorithms, code implementation, and setup to try to minimize this as much as possible.

High throughput

As mentioned before, low latency and throughput are related but not identical. For that reason, sometimes some applications that need the highest throughput possible might have some differences in design and implementation to maximize throughput. The point is that maximizing throughput might come at the cost of sacrificing average performance latencies or increasing peak latencies to achieve that.

In this section, we introduced the concepts that apply to low latency application performance and the business impact of those metrics. We will need these concepts in the rest of the book when we refer to the performance of the applications we build. Next, we will move the conversation forward and explore the programming languages available for low latency application development. We will discuss the characteristics of the languages that support low latency applications and understand why C++ has risen to the top of the list when it comes to developing and improving latency-sensitive applications.

Key benefits

Understand the impact application performance latencies have on different business use cases

Develop a deep understanding of C++ features for low latency applications through real-world examples and performance data

Learn how to build all the components of a C++ electronic trading system from scratch

Description

C++ is meticulously designed with efficiency, performance, and flexibility as its core objectives. However, real-time low latency applications demand a distinct set of requirements, particularly in terms of performance latencies. With this book, you’ll gain insights into the performance requirements for low latency applications and the C++ features critical to achieving the required performance latencies. You’ll also solidify your understanding of the C++ principles and techniques as you build a low latency system in C++ from scratch. You’ll understand the similarities between such applications, recognize the impact of performance latencies on business, and grasp the reasons behind the extensive efforts invested in minimizing latencies. Using a step-by-step approach, you’ll embark on a low latency app development journey by building an entire electronic trading system, encompassing a matching engine, market data handlers, order gateways, and trading algorithms, all in C++. Additionally, you’ll get to grips with measuring and optimizing the performance of your trading system. By the end of this book, you’ll have a comprehensive understanding of how to design and build low latency applications in C++ from the ground up, while effectively minimizing performance latencies.

Who is this book for?

This book is for C++ developers who want to gain expertise in low latency applications and effective design and development strategies. C++ software engineers looking to apply their knowledge to low latency trading systems such as HFT will find this book useful to understand which C++ features matter and which ones to avoid. Quantitative researchers in the trading industry eager to delve into the intricacies of low latency implementation will also benefit from this book. Familiarity with Linux and the C++ programming language is a prerequisite for this book.

What you will learn

Gain insights into the nature of low latency applications across various industries

Understand how to design and implement low latency applications

Explore C++ design paradigms and features for low latency development

Discover which C++ features are best avoided in low latency development

Implement best practices and C++ features for low latency

Measure performance and improve latencies in the trading system

What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Frequently bought together

Building Low Latency Applications with C++

€33.99

€37.99

€36.99

Total € 108.97

Filter reviews by

All

Feefo verified reviews

Amazon verified reviews

Salim Pamukcu Jan 19, 2024

awesome book with detailed usefull explanation with real time cases

Feefo Verified review

POE Sep 12, 2023

This book is well written and covers two important areas: developing low latency applications with C++ (as the title suggests), and electronic trading systems. The author’s expertise in both areas is evident throughout the book.There are ample code examples and plethora of topics including Internet of Things (IoT), memory pool abstraction, performance, optimizations, instrumentation, and more. A set of utilities and classes are provided to help support network socket operations. The author also walks readers through the design and development of a trading system.If you are a serious C++ developer, want to learn how to write low latency applications, or are just interested in electronic trading systems, then this book is for you.Great resource!

Amazon Verified review

Wayne Oct 16, 2023

I primarily use C# at work, but in college, I spent time using C++ for algo/data structures courses, but we never went this in-depth. If you are looking to get into C++ and have either taken a 101 and 102 in C++ or have been using Java/C# for a bit, this is a great book if you want to check out C++ on a much deeper level. There are a ton of things you likely won't know if you aren't already in the trading space, and even if it doesn't fulfill every feature required, you'll have a lot of tools to do that yourself. I used this book to check out C++ again, and it was worth it.

Reader Sep 08, 2023

From the moment I cracked open the book, I was struck by the author's evident passion for their subject matter. The depth of research and meticulous attention to detail is immediately apparent, and it's clear that this is the work of an expert in their field. This level of expertise truly enhances the reading experience and provides a sense of trustworthiness that is invaluable when diving into a complex subject.

Guanqi Oct 26, 2023

Great book for low latency trading

Building Low Latency Applications with C++: Develop a complete low latency trading ecosystem from scratch using modern C++

What do you get with eBook?

Building Low Latency Applications with C++

Introducing Low Latency Application Development in C++

Understanding requirements for latency-sensitive applications

Understanding latency-sensitive versus latency-critical applications

Measuring latency

Time to first byte

Round-trip time

Tick-to-trade

CPU clock cycles

Differentiating between latency metrics

Throughput versus latency

Mean latency

Median latency

Peak latency

Latency variance

Requirements of latency-sensitive applications

Correctness and robustness

Low latencies on average

Capped peak latency

Predictable latency – low latency variance

High throughput

Page 1 of 5

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

Building Low Latency Applications with C++: Develop a complete low latency trading ecosystem from scratch using modern C++

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs