When you execute an aggregation query, the node that receives the request sends the query to all the shards of the index. The results from each shard are gathered back and sent to the client. The aggregation results are cached at a shard level. Computing an aggregation is an expensive operation. Caching the response greatly improves the performance and reduces the strain on the system. Elasticsearch cache is smart and is automatically invalidated where there is new data. Since the cache is per shard, the cache is invalidated only for the shards that have new/modified data. Starting Elasticsearch 5.0, the request cache is enabled by default. The query JSON is used as a key.
Cache greatly improves performance for indexes that have static data. For example, if you have a time-based index, the old indexes are not changed anymore. The aggregation results for the old data can...