A Soundex full-text parser
We have seen the "tokenizer" and "extractor" plugins. We finish this chapter with a "post-processor" plugin. Such a plugin is interested in doing something with the individual words of the text, but not in splitting the text into words. It puts itself after the mysql_parse()
function, but before mysql_add_word()
. In this position it can see every word and modify it if needed, but it will be MySQL that will do the parsing job. Again, just as in the case of "extractor" plugins, this technique allows us to implement only the main functionality of the plugin, only what makes it unique, and not repeat the parsing code that already exists in the server. As an example of a "post-processor" plugin we will create a Soundex plugin—a plugin that replaces every word with its Soundex code, making the full-text search insensitive to typos.
The Soundex algorithm
The Soundex algorithm was patented in 1918. It is a phonetic algorithm that converts words to codes, which mainly corresponds...