The Smart Chinese Analysis plugin integrates Lucene's Smart Chinese analysis module into Elasticsearch for analyzing Chinese or mixed Chinese-English text. The supported analyzer uses probability knowledge based on a hidden Markov model on a large training corpus to find the optimal word segmentation for Simplified Chinese text. The strategy it uses is to first break the input text into sentences and then perform segmentation in a sentence to obtain words. This plugin provides an analyzer, which is called the smartcn analyzer, and a tokenizer called smartcn_tokenizer. Note that both cannot be configured with any parameter.
To install the smartcn Analysis plugin in the Elasticsearch Docker container, use the commands shown in the following screenshot. We then restart the container to make the plugin effective: