Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Choosing the proper directory configuration

One of the most crucial properties of Apache Lucene and Solr is the Lucene Directory implementation. The directory interface provides an abstraction layer for all I/O operations for the Lucene library. Although it seems simple, choosing the right directory implementation can affect the performance of your Solr setup in a drastic way. This recipe will show you how to choose the right directory implementation.

How to do it...

In order to use the desired directory, all you need to do is choose the right directory factory implementation and inform Solr about it. Let's assume that you want to use NRTCachingDirectory as your directory implementation. In order to do this, you need to place (or replace if it is already present) the following fragment in your solrconfig.xml file:

<directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory" />

That's all. The setup is quite simple, but I think that the question that will arise is what directory factories are available to use. When this book was written, the following directory factories were available:

  • solr.StandardDirectoryFactory
  • solr.SimpleFSDirectoryFactory
  • solr.NIOFSDirectoryFactory
  • solr.MMapDirectoryFactory
  • solr.NRTCachingDirectoryFactory
  • solr.HdfsDirectoryFactory
  • solr.RAMDirectoryFactory

Now, let's see what each of these factories provides.

How it works...

Before we get into the details of each of the presented directory factories, I would like to comment on the directory factory configuration parameter. All you need to remember is that the name attribute of the directoryFactory tag should be set to DirectoryFactory, and the class attribute should be set to the directory factory implementation of your choice. Also, some of the directory implementations can take additional parameters that define their behavior. We will talk about some of them in other recipes in the book (for example, in the Limiting I/O usage recipe in this chapter).

If you want Solr to make decisions for you, you should use the solr.StandardDirectoryFactory directory factory. It is filesystem-based and tries to choose the best implementation based on your current operating system and Java virtual machine used. If you implement a small application that won't use many threads, you can use the solr.SimpleFSDirectoryFactory directory factory that stores the index file on your local filesystem, but it doesn't scale well with a high number of threads. The solr.NIOFSDirectoryFactory directory factory scales well with many threads, but remember that it doesn't work well on Microsoft Windows platforms (it's much slower) because of a JVM bug (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6265734).

The solr.MMapDirectoryFactory directory factory has been the default directory factory for Solr for 64-bit Linux systems since Solr 3.1. This directory implementation uses virtual memory and the kernel feature called mmap to access index files stored on disk. This allows Lucene (and thus Solr) to directly access the I/O cache. This is desirable, and you should stick to this directory if near real-time searching is not needed.

If you need near real-time indexing and searching, you should use solr.NRTCachingDirectoryFactory. It is designed to store some parts of the index in memory (small chunks), and thus speeds up some near real-time operations greatly. By saying near real-time, we mean that the documents are available within milliseconds from indexing.

Note

If you want to know more about near real-time search and indexing, refer to a great explanation on the phrase on Solr wiki, available at https://wiki.apache.org/lucene-java/NearRealtimeSearch.

The solr.HdfsDirectoryFactory is used when Solr runs on HDFS filesystems, so inside a Hadoop cluster. If you are using Solr inside a Hadoop cluster, then it is almost certain that you'll want to use the directory implementation.

The last directory factory, solr.RAMDirectoryFactory, is the only one that is not persistent. The whole index is stored in the RAM memory, and thus, you'll lose your index after a restart or server crash. Also, you should remember that replication won't work when using solr.RAMDirectoryFactory. One might ask why I should use this factory? Imagine a volatile index autocomplete functionality or for unit tests of your query's relevance, or just anything you can think of when you don't need to have persistent and replicated data. However, remember that this directory is not designed to hold large amounts of data.

You have been reading a chapter from
Solr Cookbook - Third Edition - Third Edition
Published in: Jan 2015
Publisher:
ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image