ICU analysis plugin
Elasticsearch has an ICU analysis plugin. You can use this plugin to use mentioned forms in the previous section, and so ensuring that all of your tokens are in the same form. Note that the plugin must be compatible with the version of Elasticsearch in your machine:
bin/plugin install elasticsearch/elasticsearch-analysis-icu/2.7.0
After installing, the plugin registers itself by default under icu_normalizer
or icuNormalizer
. You can see an example of the usage as follows:
curl -XPUT /my_index -d '{ "settings": { "analysis": { "filter": { "nfkc_normalizer": { "type": "icu_normalizer", "name": "nfkc" } }, "analyzer": { "my_normalizer": { "tokenizer": "icu_tokenizer", "filter": [ "nfkc_normalizer" ] } } } } }'
The preceding configuration let's normalize all tokens into the NFKC normalization form.
Note
If you want more information about the ICU, refer to http://site.icu...