Improving the search experience using stemming
Stemming is the process where we convert an English word to its base form.
Some of the examples are shown as follows:
[ running , ran ] => run [ laughing , laugh , laughed ] => laugh
With this, our search would be a lot better. When someone searches for run
, then all documents, no matter whether they consist the word running or ran, will also be shown as a match.
Note
This is possible because we apply the analyzer used for indexing on the search side too.
To achieve this, we have two approaches:
The algorithmic approach: In this approach, we use a generic language-based algorithm to convert words to their stems or base form
The dictionary-based approach: In this approach, we use a lookup mechanism to map the word to its base form
Let's see how the algorithmic approach can be implemented and what are its pros and cons.
Our first choice is the snowball algorithm. It's a powerful algorithm that finds the stem of a given word using an algorithm...