This article is an excerpt from the book, "Observability with Grafana", by Rob Chapman, Peter Holmes. This book provides a holistic understanding of observability concepts using the Grafana Labs tools, teaching you how to fully leverage the LGTM stack.
PromQL, or Prometheus Query Language, is a powerful tool designed to work with Prometheus, an open-source systems monitoring and alerting toolkit. Initially developed by SoundCloud in 2012 and later accepted by the Cloud Native Computing Foundation in 2016, Prometheus has become a crucial component of modern infrastructure monitoring. PromQL allows users to query data stored in Prometheus, enabling the creation of insightful dashboards and setting up alerts based on the performance metrics of applications and systems. This article will explore the core functionalities of PromQL, including how it interacts with metrics data and how it can be used to effectively monitor and analyze system performance.
Prometheus was initially developed by SoundCloud in 2012; the project was accepted by the Cloud Native Computing Foundation in 2016 as the second incubated project (after Kubernetes), and version 1.0 was released shortly after. PromQL is an integral part of Prometheus, which is used to query stored data and produce dashboards and alerts.
Before we delve into the details of the language, let’s briefly look at the following ways in which Prometheus-compatible systems interact with metrics data:
Figure 5.1 – A simplified view of metric data stored in the TSDB
* Each unique __name__ value creates a metric. In the preceding figure, the metric is app_ frontend_requests.
* Each unique set of labels creates a time series. In the preceding figure, the set of all labels is the time series.
* A time series will contain multiple samples, each with a unique timestamp. The preceding figure shows a single sample, but over time, multiple samples will be collected for each
time series.
* The number of unique values for a metric label is referred to as the cardinality of the l abel. Highly cardinal labels should be avoided, as they signifi cantly increase the storage costs of the metric.
The following diagram shows a single metric containing two time series and five samples:
Figure 5.2 – An example of samples from multiple time series
In Grafana, we can see a representation of the time series and samples from a metric. To do this, follow these steps:
1. In your Grafana instance, select Explore in the menu.
2. Choose your Prometheus data source, which will be labeled as grafanacloud-<team>prom (default).
3. In the Metric dropdown, choose app_frontend_requests_total, and under Options, set Format to Table, and then click on Run query. Th is will show you all the samples and time series in the metric over the selected time range. You should see data like this:
Figure 5.3 – Visualizing the samples and time series that make up a metric
Now that we understand the data structure, let’s explore PromQL.
In this section, we will take you through the features that PromQL has. We will start with an explanation of the data types, and then we will look at how to select data, how to work on multiple datasets, and how to use functions. As PromQL is a query language, it’s important to know how to manipulate data to produce alerts and dashboards.
Data types
PromQL offers three data types, which are important, as the functions and operators in PromQL will work diff erently depending on the data types presented:
Figure 5.4 – An instant vector
Figure 5.5 – Range vectors
PromQL offers several tools for you to select data to show in a dashboard or a list, or just to understand a system’s state. Some of these are described in the following table:
Table 5.1 – The selection operators available in PromQL
In addition to the operators that allow us to select data, PromQL offers a selection of operators to compare multiple sets of data.
Operators between two datasets
Some data is easily provided by a single metric, while other useful information needs to be created from multiple metrics. The following operators allow you to combine datasets.
Table 5.2 – The comparison operators available in PromQL
Vector matching is an initially confusing topic; to clarify it, let’s consider examples for the three cases of vector matching – one-to-one, one-to-many/many-to-one, and many-to-many.
By default, when combining vectors, all label names and values are matched. This means that for each element of the vector, the operator will try to find a single matching element from the second vector.
Let’s consider a simple example:
10{color=blue,smell=ocean}
31{color=red,smell=cinnamon}
27{color=green,smell=grass}
19{color=blue,smell=ocean}
8{color=red,smell=cinnamon} 14{color=green,smell=jungle}
29{color=blue,smell=ocean}
39 {color=red,smell=cinnamon}
29{color=blue}
39{color=red}
41{color=green}
When color=blue
and smell=ocean, A{} + B{}
gives 10 + 19 = 29
, and when color=red
and smell=cinnamon, A{} + B{}
gives 31 + 8 = 29
. The other elements do not match the two vectors so are ignored.
When we sum the vectors using on (color), we will only match on the color label; so now, the two green elements match and are summed.
This example works when there is a one-to-one relationship of labels between vector A and vector B. However, sometimes there may be a many-to-one or one-to-many relationship – that is, vector A or vector B may have more than one element that matches the other vector. In these cases, Prometheus will give an error, and grouping syntax must be used. Let’s look at another example to illustrate this:
7{color=blue,smell=ocean}
5{color=red,smell=cinamon}
2{color=blue,smell=powder}
20{color=blue,smell=ocean}
8{color=red,smell=cinamon} 14{color=green,smell=jungle}
27{color=blue,smell=ocean}
13{color=red,smell=cinamon}
22{color=blue,smell=powder}
Now, we have two different elements in vector A with color=blue
. The group_left
command will use the labels from vector A but only match on color. This leads to the third element of the combined vector having a value of 22, when the item matching in vector B has a different smell. The group_right
operator will behave in the opposite direction.
The final option is a many-to-many vector match. These matches use the logical operators and, unless, and or to combine parts of vectors A and B. Let’s see some examples:
10{color=blue,smell=ocean}
31{color=red,smell=cinamon}
27{color=green,smell=grass}
19{color=blue,smell=ocean}
8{color=red,smell=cinamon} 14{color=green,smell=jungle}
10{color=blue,smell=ocean}
31{color=red,smell=cinamon}
27{color=green,smell=grass}
10{color=blue,smell=ocean}
31{color=red,smell=cinamon}
27{color=green,smell=grass}
14{color=green,smell=jungle}
Unlike the previous examples, mathematical operators are not being used here, so the values of the elements are the values from vector A, but only the elements of A that match the logical condition in B are returned.
PromQL is an essential component of Prometheus, offering users a flexible and powerful means of querying and analyzing time-series data. By understanding its data types and operators, users can craft complex queries that provide deep insights into system performance. The language supports a variety of data selection and comparison operations, allowing for precise monitoring and alerting. Whether working with instant vectors, range vectors, or scalars, PromQL enables developers and operators to optimize their use of Prometheus for monitoring and alerting, ensuring systems remain performant and reliable. As organizations continue to embrace cloud-native architectures, mastering PromQL becomes increasingly vital for maintaining robust and efficient systems.
Rob Chapman is a creative IT engineer and founder at The Melt Cafe, with two decades of experience in the full application life cycle. Working over the years for companies such as the Environment Agency, BT Global Services, Microsoft, and Grafana, Rob has built a wealth of experience on large complex systems. More than anything, Rob loves saving energy, time, and money and has a track record for bringing production-related concerns forward so that they are addressed earlier in the development cycle, when they are cheaper and easier to solve. In his spare time, Rob is a Scout leader, and he enjoys hiking, climbing, and, most of all, spending time with his family and six children.
Peter Holmes is a senior engineer with a deep interest in digital systems and how to use them to solve problems. With over 16 years of experience, he has worked in various roles in operations. Working at organizations such as Boots UK, Fujitsu Services, Anaplan, Thomson Reuters, and the NHS, he has experience in complex transformational projects, site reliability engineering, platform engineering, and leadership. Peter has a history of taking time to understand the customer and ensuring Day-2+ operations are as smooth and cost-effective as possible.