Boosting phrases over words
Since you are in a competitive market, assume that one day your online product met a disaster wherein your product's search result suddenly falls down. To overcome this scenario and survive in such a competitive market, probably you would like to favor documents that have the exact phrase typed by the end-user over the documents that have matches in separate words. We will guide you on how to achieve this in this section.
I assume that we will use dismax query parser, instead of the standard one. Moreover, we will re-use the same schema.xml
that was demonstrated in the Searching for a phrase section in this chapter.
Our sample data looks like this:
<add> <doc> <field name="id">1</field> <field name="title">Annual 2012 report final draft</field> </doc> <doc> <field name="id">2</field> <field name="title">2007 report</field> </doc> <doc> <field name="id">3</field> <field name="title">2012 draft report</field> </doc> </add>
As mentioned earlier, we would like to boost or give preference to those documents that have phrase matches over others matching the query. To achieve this, run the following query to your Solr instance:
http://localhost:8080/solr/select?defType=dismax&pf=title^100&q=2012 +report&qf=title
And the desired result should look like:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="qf">title</str> <str name="pf">title^100</str> <str name="q">2012 report</str> <str name="defType">dismax</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="id">1</str> <str name="title">Annual 2012 report last draft</str> </doc> <doc> <str name="id">3</str> <str name="title">2012 draft report</str> </doc> </result> </response>
We have a couple of parameters which have been added to this example and might be new to you. Don't worry! I will explain all of them. The first parameter is defType
, which tells Solr which query parser we will be using (dismax in our case). If you are not familiar or would like to learn more about dismax, http://wiki.apache.org/solr/DisMax is where you should go! One of the features of this query parser is the ability to tell Solr which field should be used to search for phrases, and this is achieved using the pf
parameter. The pf
parameter takes a list of fields with the boost that corresponds to them, for instance, pf=title^100
which means that the phrase found in the title field will be boosted with a value of 100. The q
parameter is the standard query parameter which you might be familiar with. In our example, we passed the words we are searching for using AND operator. Through our example we are looking for the documents which satisfy '2012' AND 'report' equation, also known as occurrences of both '2012' and 'report' words found in the title.
Tip
You must remember that you can't pass a query such as fieldname: value to the q
parameter and use dismax query parser. The fields you are searching against should be specified using the qf
parameter.