What's text normalization?
Text normalization is the process of transforming text into a common form. That is necessary in order to remove insignificant differences among identical words.
Let's look at déjà-vu word to handle.
The word deja-vu is not equal to déjà-vu for string comparison. Even Déjà-vu is not equal to déjà-vu. Similarly, Michè'le is not equal to Michèle. All these words (that is, tokens) are not equal because the comparison is made at the byte-level by Elasticsearch. This means, for two tokens to be considered the same, they need to consist of exactly the same bytes when these tokens are compared.
However, these words have similar meanings. In other words, the same thing is being sought when a user is searching for the word déjà-vu and another user, deja-vu or deja vu. It should also be noted that the Unicode standard allows you to create equivalent text in multiple ways.
For example, take letters é (Latin Capital letter e with grave) and é (Latin Capital letter e with acute...