The purpose of an analyzer is to generate terms from a document and to create inverted indexes (such as lists of unique words and the document IDs they appear in, or a list of word frequencies). An analyzer must have only one tokenizer and, optionally, as many character filters and token filters as the user wants. Whether it is a built-in analyzer or a custom analyzer, analyzers are just an aggregation of the processes of these three building blocks, as illustrated in the following diagram:
Recall from Chapter 1, Overview of Elasticsearch 7, (you can refer to the Analyzer section) that a standard analyzer is composed of a standard tokenizer and a lowercase token filter. A standard tokenizer provides grammar-based tokenization, while a lowercase token filter normalizes tokens to lowercase. Let's suppose that the input string is an HTML text string...