Advanced sharding with SolrCloud
Let's explore some of the advanced concepts of sharding, starting with shard splitting.
Shard splitting
Let us say that we have created a two-shard replica looking at the current number of queries per second for a system. In future, if the number of queries per second increases to, say, twice or thrice the current value, we will need to add more shards. Now, one way is to create a separate cloud with say four shards and re-index all the documents. This is possible if the cluster is small. If we are dealing with a 50 shard cluster with more than a billion documents, re-indexing of the complete set of documents again may be expensive. For such scenarios, SolrCloud has the concept of shard splitting.
In shard splitting, a shard is divided into two new shards on the same machine. All three shards, the old one and the two new ones, remain. We can check the sanity of the shards and then delete the existing shard. Let us see a practical implementation of the same.
Before...