A PHP full-text parser
To show the layout of a full-text parser plugin we will create a simple parser to parse PHP scripts. PHP syntax has a few peculiarities that are not taken into account by the MySQL built-in full-text parser. In particular, all variable names in PHP start with a dollar sign, which is, in fact, a part of the name; a variable $while
is not the same as a loop statement while
. But a dollar sign is not just another character that can be used in variable names—the string"$foo$bar"
contains two PHP variables, not one. Also, variables can have different scopes; a variable foo::$bar
is not the same as a variable $bar
. Let's try to solve this in our full-text parser plugin. According to the above, it will be a "tokenizer" plugin—a plugin that splits the text into words.
As usual, we start by including the required header files:
#include <mysql/plugin.h> #include <stdio.h> #include <ctype.h>
A valid PHP variable name can contain letters, underscores, digits...