Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Solr Cookbook - Third Edition

You're reading from   Solr Cookbook - Third Edition Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783553150
Length 356 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Rafal Kuc Rafal Kuc
Author Profile Icon Rafal Kuc
Rafal Kuc
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Apache Solr Configuration FREE CHAPTER 2. Indexing Your Data 3. Analyzing Your Text Data 4. Querying Solr 5. Faceting 6. Improving Solr Performance 7. In the Cloud 8. Using Additional Functionalities 9. Dealing with Problems 10. Real-life Situations Index

Changing similarity

Most times, the default way to calculate the score of your documents is what you need. However, sometimes you need more from Solr than just the standard behavior. For example, you might want shorter documents to be more valuable compared to longer ones. Let's assume that you want to change the default behavior and use different score calculation algorithms for the description field of your index. This recipe will show you how to leverage this functionality.

Getting ready

Before choosing one of the score calculation algorithms available in Solr, it's good to read a bit about them. The detailed description of all the algorithms is beyond the scope of this recipe and the book (although a simple description is mentioned later in the recipe), but I suggest visiting the Solr wiki page (or Javadocs) and reading basic information about the available implementations.

How to do it...

For the purpose of this recipe, let's assume we have the following index structure (just add the following entries to your schema.xml file):

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general_dfr" indexed="true" stored="true" />

The string and text_general types are available in the default schema.xml file provided with the example Solr distribution. However, we want DFRSimilarity to be used to calculate the score for the description field. In order to do this, we introduce a new type, which is defined as follows (just add the following entries to your schema.xml file):

<fieldType name="text_general_dfr" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
  <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
 <similarity class="solr.DFRSimilarityFactory">
  <str name="basicModel">P</str>
  <str name="afterEffect">L</str>
  <str name="normalization">H2</str>
  <float name="c">7</float>
 </similarity>
</fieldType>

Also, to use the per-field similarity, we have to add the following entry to your schema.xml file:

<similarity class="solr.SchemaSimilarityFactory"/>

That's all. Now, let's have a look and see how this works.

How it works...

The index structure previously presented is pretty simple as there are only three fields. The one thing we are interested in is that the description field uses our own custom field type called text_generanl_dfr.

The thing we are most interested in is the new field type definition called text_general_dfr. As you can see, apart from the index and query analyzer, there is an additional section called similarity. It is responsible for specifying which similarity implementation to use to calculate the score for a given field. You are probably used to defining field types, filters, and other things in Solr, so you probably know that the class attribute is responsible for specifying the class that implements the desired similarity implementation, in our case, solr.DFRSimilarityFactory. Also, if there is a need, you can specify additional parameters that configure the behavior of your chosen similarity class. In the previous example, we specified the four additional parameters of basicModel, afterEffect, normalization, and c, all of which define the DFRSimilarity behavior.

The solr.SchemaSimilarityFactory class is required to specify the similarity for each field.

Although the recipe is not about all the similarities available, I wanted to list the available ones. Note that each similarity might require and use different configuration parameters (all of them are described in the provided Javadocs). The list of currently available similarity factories are:

There's more...

In addition to per-field similarity definition, you can also configure the global similarity.

Changing the global similarity

Apart from specifying the similarity class on a per-field basis, you can choose fields other than the default one in a global way. For example, if you want to use BM25Similarity as the default field, you should add the following entry to your schema.xml file:

<similarity class="solr.BM25SimilarityFactory"/>

As with the per-field similarity, you need to provide the name of the factory class that is responsible for creating the appropriate similarity class.

You have been reading a chapter from
Solr Cookbook - Third Edition - Third Edition
Published in: Jan 2015
Publisher:
ISBN-13: 9781783553150
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image