The ICU Analysis plugin is a set of libraries that integrates the Lucene ICU module into Elasticsearch. Essentially, the purpose of the ICU is to add the support of Unicode and globalization to provide better text segmentation analysis of Asian languages. From Elasticsearch's point of view, this plugin provides new components in text analysis, as shown in the following table:
Components | Description | |
Character filter | ICU Normalizer Character Filter | The icu_normalizer character filter converts text into unique, equivalent character sequences. It supports three optional parameters: name,mode, and unicode_set_filter. The name parameters can be nfc, nfkc, and nfkc_cf (the default). The mode parameter can be decompose. |
Tokenizer | ICU Tokenizer | The icu_tokenizer tokenizer splits a piece of text into words on word boundaries. It adds... |