Using the n-gram approach to do performant trailing wildcard searches
Many users working with traditional RDBMS systems are used to wildcard searches. The most common among them are the ones using the *
characters, which means zero or more characters. If you used SQL databases, you probably saw searches such as this:
AND name LIKE 'ABC12%'
However, wildcard searchers are not too efficient when it comes to Solr. This is because Solr needs to enumerate all the terms because the query is executed. So, how do we prepare our Solr deployment to handle trailing wildcard characters in an efficient way? This recipe will show you how to prepare your data and make efficient searches.
How to do it...
There are some steps we need to make efficient wildcards using the n-gram approach:
The first step is to create a proper index structure. Let's assume we have the following fields defined in the
schema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name...