Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Configuring SolrCloud for NRT use cases

Nowadays, we are used to getting information as soon as we can. We want our data to be indexed fast, efficiently, and be available for searching as soon as possible; in perfect cases, right after they were sent for indexation. This is what near real time in Solr is all about— the ability to search the documents right after they are sent for indexation or with a very short latency. This recipe will show you how to configure Solr, especially SolrCloud for such use cases.

How to do it...

I assume that you already have SolrCloud set up and ready to go (if you don't, refer to the Creating a new SolrCloud cluster recipe in Chapter 7, In the Cloud); you will now know how to update your collection configuration and be interested in near real-time search.

Let's assume that we want our data to be available about one second after it's indexed. To do this, we need to change the solrconfig.xml file so that its update handler section looks as shown:

<updateHandler class="solr.DirectUpdateHandler2">
 <updateLog>
  <str name="dir">${solr.ulog.dir:}</str>
 </updateLog>
 
 <autoSoftCommit>
  <maxTime>1000</maxTime>
 </autoSoftCommit>
 
 <autoCommit>
  <maxTime>300000</maxTime>
  <openSearcher>false</openSearcher>
 </autoCommit>
</updateHandler>

That's all; after a restart or configuration reload, documents should be available to search after about one second.

How it works...

By changing the configuration of the update handler, we introduced three things. First, using the <updateLog> section, we told Solr to use the update log functionality. The transaction log (another name for this functionality) is a file where Solr writes raw documents so that they can be used in a recovery process. In SolrCloud, each instance of Solr needs to have its own transaction log configured. When a document is sent for indexation, it gets forwarded to the shard leader and the leader sends the document to all its replicas. After all the replicas respond to the leader, the leader itself responds to the node that sent the original request, and this node reports the indexing status to the client. At this point in time, the document is written into a transaction log, not yet indexed, but safely written; so, if a failure occurs (for example, the server shuts down), the document is not lost. During a startup process, the transaction log is replayed and the documents stored in it are indexed, so even if they were not indexed, they will be if a failure happens. After the process of storing the data in transaction logs, Solr can easily index the data located there.

The second thing is the autoSoftCommit section. This is a new autocommit option introduced in Solr 4.0. It basically allows us to reopen the index searcher without closing and opening a new one. For us, this means that our documents that were sent for indexation will start to be visible and available to search. We do this once every 1000 milliseconds as configured using the maxTime tag. The soft commit was introduced because reopening is easier to do and is less resource intensive than closing and opening a new index searcher. In addition to this, it doesn't persist the data to disk by creating a new segment.

However, one has to remember that even though the soft commit is less resource intensive, it is still not free. Some Solr caches will have to be reloaded, such as the filter, document, or query result caches. We will get into more configuration details in the Configuring SolrCloud for high-indexing use cases and Configuring SolrCloud for high-querying use cases recipes in this chapter.

The last thing is the autocommit defined in the autoCommit section, which is called the hard autocommit. It is responsible for flushing data and closing the index segment used for it (because of this segment, merge might start in the background). In addition to this, the hard autocommit also closes the transaction log and opens a new one. We've configured this operation to happen every 5 minutes (300000 milliseconds). What we also included is the <openSearcher>false</openSearcher> section. This means that Solr won't open a new index searcher during a hard auto commit operation. We do this on purpose; we define index searcher opening periods in the soft autocommit section. If we set the openSearcher section to true, Solr will close the old index searcher, open a new one, and automatically warm caches. Before Solr 4.0, this was the only way to have documents visible for searching when using autocommit.

One additional thing to remember is that with soft autocommit set to reopen the searcher very often, all the top level caches, such as the filter, document, and query result caches, will be invalidated. It is worth thinking and doing performance tests if the cache (all or some of them) are actually worth being used at all. I would like to give a clear advice here, but this is highly dependent on the use case. You can read more about cache configuration in the Configuring the document cache, Configuring the query result cache, and Configuring the filter cache recipes in Chapter 6, Improving Solr Performance.

You have been reading a chapter from
Solr Cookbook - Third Edition - Third Edition
Published in: Jan 2015
Publisher:
ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image