Optimizing SolrCloud
In any distributed system, if a user has fired a query across multiple nodes, the waiting time will be dependent upon the average performance of the slowest nodes. This concept is called "laggard problem" for indexes of your instance. This problem states that the response to your search query, which is an aggregation of results from all the shards, is controlled by the following formulae:
QueryResponse = avg(max(shardResponseTime))
If you have distributed search in shards, a shard node that has the slowest response time will impact your query response time, and it will start increasing. Similar to the laggard problem, a distributed search also faces limitations. For example, each document uploaded on the distributed Big Data must have a unique key, and that unique key must be stored in the Solr repository, To do that, Solr schema.xml
should have stored=true
against the key attribute. This unique key has to be unique across all shards. It enables Apache Solr to distribute...