Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Configuring SolrCloud for high-indexing use cases

Solr is designed to work under high load, both when it comes to querying and indexing. However, the default configuration provided with the example Solr deployment is not sufficient when it comes to these use cases. This recipe will show you how to prepare your SolrCloud collection configuration for use cases when the indexing rate is very high.

Getting ready

Before continuing reading the recipe, read the Running Solr on a standalone Jetty and Configuring SolrCloud for NRT use cases recipes in this chapter.

How to do it...

In very high indexing use cases, there are chances that you'll use bulk indexing to index your data. In addition to this, because we are talking about SolrCloud, we'll use autocommit so that we can leave the data durability and visibility management to Solr. Let's discuss how to prepare configuration for a use case where indexing is high, but the querying is quite low; for example, when using Solr for log centralization solutions.

Let's assume that we are indexing more than 1,000 documents per second and that we have four nodes, each of 12 cores and 64 GB of RAM. Note that this specification is not something we need to index the number of documents, but they are here for reference.

  1. First, we'll start with the autocommit configuration, which will look as follows (we add this to the solrconfig.xml file):
    <updateHandler class="solr.DirectUpdateHandler2">
     <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
     </updateLog>
    
     <autoSoftCommit>
      <maxTime>600000</maxTime>
     </autoSoftCommit>
    
     <autoCommit>
      <maxTime>15000</maxTime>
      <openSearcher>false</openSearcher>
     </autoCommit>
    </updateHandler>
  2. The second step is to adjust the number of indexing threads. To do this, we add the following information to the indexConfig section of solrconfig.xml:
    <maxIndexingThreads>10</maxIndexingThreads>
  3. The third step is to adjust the memory buffer size for each indexing thread. To do this, we add the following information to the indexConfig section of solrconfig.xml:
    <ramBufferSizeMB>128</ramBufferSizeMB>

Now, let's discuss what each of these changes mean.

How it works...

We started with tuning the autocommit setting, which you should be aware of after reading this recipe. Since we are not worried about documents being visible as soon as they are indexed, we set the soft autocommit's maxTime property to 600000. This means that we will reopen the searcher every 10 minutes, so our documents will be visible maximum 10 minutes after they are sent to indexation.

The one thing to look at is the short time for hard commit, which is every 15 seconds (the maxTime property of the autoCommit section set to 15000). We did this because we don't want transaction logs to contain a high number of entries because this can cause problems during the recovery process.

We also increased the default number of threads an index writer can use from the default 8 to 10 by setting the maxIndexingThreads property. Since we have 12 cores on each machine, and we are not querying much, we can allow more threads using the index writer. If the index writer uses the number of threads that's equal to the maxIndexingThreads property, the next thread will wait for one of the currently running to end. Remember that the maxIndexingThreads property sets the maximum allowed indexing threads, which doesn't mean they will be used every time.

We also increased the default RAM buffer size from 100 to 128 using the ramBufferSizeMB property. We did this to allow Lucene to buffer as many documents as needed in memory. If the size of the documents in the buffer is larger than the given value of the ramBufferSizeMB property, Lucene will flush the data to the directory, which will decide what else to do. We have to remember though that we are also using autocommit, so the data will be flushed every 15 seconds because of hard autocommit settings.

Note

Remember that we didn't take into consideration the size of the cluster because we had the maximum number of nodes. You should remember that if I/O is the bottleneck when indexing, spreading the collection among more nodes should help with the indexing load.

In addition to this, you might want to look at the merging policy and segment merge processes as this can become a major bottleneck. If you are interested, refer to the Tuning segment merging recipe in Chapter 9, Dealing with Problems.

You have been reading a chapter from
Solr Cookbook - Third Edition - Third Edition
Published in: Jan 2015
Publisher:
ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image