After months of developing and testing, Redis 3.0 cluster was released on April 1st, 2015. A Redis Cluster is a set of Redis instances connecting each other with the gossip protocol, and each instance serves an nonoverlapping subset of all caching data. In this post, I'd like to talk about that how users can benefit from it, and also what's the cost of those benefits.
The essence of Redis you may already know is that no matter what kinds of structure Redis supports, it is simply a key-value caching utility. Things are the same with Redis Cluster. A Redis Cluster is not something that magically shards your data into different Redis instances separately. The keys are still the unit and not splittable. For example, if you have a list of 100 elements, they will still be stored in one key, in one Redis, no matter how many instances in the cluster. More precisely, Redis Cluster uses CRC16 of a key string mod 16384 as the slot number of the key, and each master Redis instance serves some of the all 16384 slots, so that each instance just takes responsibility for keys in their owning slots.
Knowing this you may soon realize that Redis Cluster finally catches up with the multiple cores fashion. As we know, Redis is designed as an asynchronous single-threaded program, which means although it behaves non-blocking. It can, however, use up to only 1 CPU. Since Redis Cluster simply splits keys into different instances by hash and they could serve data simultaneously, as many CPUs as the number of instances in a cluster are possible to be used so that Redis QPS may become much more than a standalone Redis.
Another good news is that Redis instances on different hosts can be joined into one cluster, which means the memory a Redis service could use won't be limited to one host machine any longer, and you won't always worry about how much memory Redis may consume three month later because if memory is about to run out, we can extend Redis capacity by starting some more cluster mode instances, joining them into the cluster and doing a reshard. There is also a great news for those who turns on persistence options (RDB or AOF). When a Redis do persistence it will fork before writing data, which probably causes a latency if your dataset is really large. But there is no large thing in a cluster, since it's all sharded, and each instance just persists its own subset.
The next advantage you should know is the availability improvement. A Redis Cluster will be much more robust than a standalone Redis, if you deploy a slave for each master Redis. The slaves in cluster mode are different from those in standalone mode, as they can automatically failover its master if its master is disconnected (accidentally killed or network fault, etc). And "the gossip protocol" we mentioned before means there is no central controller in a Redis Cluster, so that if one master is down and replaced by its slave, other masters will tell you who's the new guy to access.
Besides the good things Redis Cluster offers to us, we should also take a look at what a cluster cannot do, or do well. The cluster model which Redis chooses sacrifices consistency for availability. It is good enough for a data caching solution. But as a consequence you may soon find some problems with multiple-keys command like MGET, since Redis Cluster requires that all keys manipulated in each operation shall be in one slot (otherwise you'll get a CROSSSLOT error). This restriction is so strong that those operations, not only MGET, MSET, but also EVAL, SUNION, BRPOPLPUSH, etc, are generally unavailable in a cluster. However, if you store all keys in one slot intendedly, the cluster loses it meaning.
Another practice to avoid is to store large object intensively, like overwhelmingly huge lists, hashes, sets which are unable to shard. You may break hashes down to individual keys, but therefore you cannot do a HGETALL. You should also think about how to split lists or sets if you want to take advantage of cluster.
Those are things you should know about Redis Cluster if you decide to use it. We must say it's a great improvement in availability and performance, as long as you don't the particular multi-keys commands frequently. So, stay with standalone Redis, or proceed to Redis Cluster, it's time to make your choice.