Searching for a phrase
There might be situations wherein you need to search a document title within millions of documents for which string based search is of course not a good idea. So, the question for ourselves; is it possible to achieve using Solr? Fortunately, yes and the next example will guide you through it.
Assume that you have the following type defined, that needs to be added to your schema.xml
file.
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> </analyzer> </fieldType>
And then, add the following fields to your schema.xml
.
<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text" indexed="true" stored="true" />
Assume that your data looks like this:
<add> <doc> <field name="id">1</field> <field name="title">2012 report</field> </doc> <doc> <field name="id">2</field> <field name="title">2007 report</field> </doc> <doc> <field name="id">3</field> <field name="title">2012 draft report</field> </doc> </add>
Now, let us instruct Solr to find the documents that have a 2012 report phrase embedded in the title. Execute the following query to Solr:
http://localhost:8080/solr/select?q=title:"2012 report"
If you get the following result, bingo !!! your query worked!
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">title:"2012 report"</str> </lst> </lst> <result name="response" numFound="1" start="0"> <doc> <str name="id">1</str> <str name="title">2012 report</str> </doc> </result> </response>
The debug query (the debugQuery=on
parameter) shows us what lucene query was made:
<str name="parsedquery">PhraseQuery(title:"2012 report")</str>
As you must have noticed, we got just one document as a result of our query, omitting even the document with the title: 2012 draft report (which is very appropriate and perfect output).
We have used only two fields to demonstrate the concept due to the fact that we are more committed to search a phrase within the title field, here in this demonstration.
Interestingly, here standard Solr query parser has been queried; hence, the field name and the associated value we are looking for can be specified. The query differs from the standard word-search query by using the "
character both at the start and end of the query. It dictates Solr to consider the search as a phrase query instead of a term query (which actually makes the difference!). So, this phrase query tells Solr to search considering all the words as a single unit, and not individually.
In addition to this, the phrase query just ensured that the phrase query (that is, the desired one) was made instead of the standard term query.