Dgraph releases Ristretto, a fast, concurrent and memory-bound Go cache library

Last week, the team at Dgraph released Ristretto, a fast, fixed size, concurrent, memory-bound Go cache library. It is contention-proof and focuses on throughput as well as hit ratio performance.

There was a need for a memory-bound and concurrent Go cache in Dgraph so the team used a sharded map with shard eviction for releasing memory but that led to memory issues. The team then repurposed Groupcache’s LRU with the help of mutex locks for thread safety.

But later the team realised that the cache suffered from severe contention. The team removed this cache and improved their query latency by 5-10x as the cache was slowing down the process. The team realised that the concurrent cache story in Go is broken and it needs to be fixed.

The official page reads, “In March, we wrote about the State of Caching in Go, mentioning the problem of databases and systems requiring a smart memory-bound cache which can scale to the multi-threaded environment Go programs find themselves in.”

Ristretto is built on 3 key principles

Ristretto is built on three key principles including fast accesses, high concurrency, contention resistance and memory bounding. Let’s discuss more about the principles and how the team achieved them:

Fast hash with runtime.memhash

The team experimented with store interface within Ristretto and found out that sync.Map performs well for read-heavy workloads but it deteriorates for write workloads. As there was no thread-local storage, the team worked with sharded mutex-wrapped Go maps that gave good performance results. The team used 256 shards to ensure that the performance doesn’t get affected while working with a 64-core server.

With a shard based approach, the team also needed a quick way to understand which shard a key should go in. The long keys consumed too much memory so the team used uint64 for keys, instead of storing the entire key.

There was a need for using the hash of the key in multiple places and to quickly generate a fast hash, the team borrowed runtime.memhash from Go Runtime. The runtime.memhash function uses assembly code for generating a hash, quickly.

Handling concurrency and contention resistance with batching

The team wanted to achieve high hit ratios but that would require managing metadata about the information that is currently present in the cache and the information that will be needed in it. They took inspiration from the paper BP-Wrapper that explains two ways for mitigating contention: prefetching and batching. The team only used ‘batching’ to lower contention instead of acquiring a mutex lock for every metadata mutation.

While talking about concurrency, Ristretto performs well under heavy concurrent load but it would lose some metadata while delivering better throughput performance.

The page reads, “Interestingly, that information loss doesn’t hurt our hit ratio performance because of the nature of key access distributions. If we do lose metadata, it is generally lost uniformly while the key access distribution remains non-uniform. Therefore, we still achieve high hit ratios and the hit ratio degradation is small as shown by the following graph.”

Key cost

The workloads usually have variable-sized values where one value ncan cost a few bytes while another will cost few kilobytes and some other value might cost a few megabytes. In this case, it is not possible to have the same memory cost for all of them.

In Ristretto, the cost is attached to every key-value and users can easily specify what that cost is while calling the Set function. This cost is calculated against the MaxCost of the cache.

The page reads, “When the cache is operating at capacity, a heavy item could displace many lightweight items. This mechanism is nice in that it works well for all different workloads, including the naive approach where each key-value costs 1.”

To know more about Ristretto and its key principles in detail, check out the official post.