[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box]
Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset.
Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations.
We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations.
We will learn about the following metric aggregations:
Let us learn about them one by one.
Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following:
SELECT sum(downloadTotal) FROM usageReport;
The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields.
In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first.
Here is how to write a simple sum aggregation:
GET bigginsight/_search
{
"aggregations": { 1
"download_sum": { 2
"sum": { 3
"field": "downloadTotal" 4
}
}
},
"size": 0 5
}
Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result.
The response should look like the following:
{
"took": 92,
...
"hits": {
"total": 242836, 1
"max_score": 0,
"hits": []
},
"aggregations": { 2
"download_sum": { 3
"value": 2197438700 4
}
}
}
Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points:
The average, min, and max aggregations are very similar. Let's look at them briefly.
The average aggregation finds an average across all documents in the querying context:
GET bigginsight/_search
{
"aggregations": {
"download_average": { 1
"avg": { 2
"field": "downloadTotal"
}
}
},
"size": 0
}
The only notable differences from the sum aggregation are as follows:
The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same.
Here is how we will find the minimum value of the downloadTotal field in the entire index/type:
GET bigginsight/_search
{
"aggregations": {
"download_min": {
"min": {
"field": "downloadTotal"
}
}
},
"size": 0
}
Let's finally look at max aggregation also.
Here is how we will find the maximum value of the downloadTotal field in the entire index/type:
GET bigginsight/_search
{
"aggregations": {
"download_max": {
"max": {
"field": "downloadTotal"
}
}
},
"size": 0
}
These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations.
These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics.
Let's look at the stats aggregation first.
The stats aggregation computes the sum, average, min, max, and count of documents in a single pass:
GET bigginsight/_search
{
"aggregations": {
"download_stats": {
"stats": {
"field": "downloadTotal"
}
}
},
"size": 0
}
The structure of the stats request is the same as the other metric aggregations we have seen
so far, so nothing special is going on here.
The response should look like the following:
{
"took": 4,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"download_stats": {
"count": 242835,
"min": 0,
"max": 241213,
"avg": 9049.102065188297,
"sum": 2197438700
}
}
}
As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code.
Let us look at the extended stats aggregation.
The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation:
GET bigginsight/_search
{
"aggregations": {
"download_estats": {
"extended_stats": {
"field": "downloadTotal"
}
}
},
"size": 0
}
The response looks like the following:
{
"took": 15,
"timed_out": false,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"download_estats": {
"count": 242835,
"min": 0,
"max": 241213,
"avg": 9049.102065188297,
"sum": 2197438700,
"sum_of_squares": 133545882701698,
"variance": 468058704.9782911,
"std_deviation": 21634.664429528162,
"std_deviation_bounds": {
"upper": 52318.43092424462,
"lower": -34220.22679386803
}
}
}
}
It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds.
Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following:
select count(*) from (select distinct username from usageReport) u;
Finding the cardinality or the number of unique values for a specific field is a very common
requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month.
Let us understand how we find out the count of unique users for which we have network traffic data:
GET bigginsight/_search
{
"aggregations": {
"unique_visitors": {
"cardinality": {
"field": "username"
}
}
},
"size": 0
}
The cardinality aggregation response is just like the other metric aggregations:
{
"took": 110,
...,
"hits": {
"total": 242836,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique_visitors": {
"value": 79
}
}
}
To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application.
If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.