You're reading from Elasticsearch 8.x Cookbook Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Product type Paperback

Published in May 2022

Publisher Packt

ISBN-13 9781801079815

Length 750 pages

Edition 5th Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1):

Alberto Paro

View More author details

Table of Contents (20) Chapters

Preface

1. Chapter 1: Getting Started

2. Chapter 2: Managing Mappings FREE CHAPTER

3. Chapter 3: Basic Operations

4. Chapter 4: Exploring Search Capabilities

5. Chapter 5: Text and Numeric Queries

6. Chapter 6: Relationships and Geo Queries

7. Chapter 7: Aggregations

8. Chapter 8: Scripting in Elasticsearch

9. Chapter 9: Managing Clusters

10. Chapter 10: Backups and Restoring Data

11. Chapter 11: User Interfaces

12. Chapter 12: Using the Ingest Module

13. Chapter 13: Java Integration

14. Chapter 14: Scala Integration

15. Chapter 15: Python Integration

16. Chapter 16: Plugin Development

17. Chapter 17: Big Data Integration

18. Chapter 18: X-Pack

19. Other Books You May Enjoy

Specifying different analyzers

In the previous recipes, we learned how to map different fields and objects in Elasticsearch, and we described how easy it is to change the standard analyzer with the analyzer and search_analyzer properties.

In this recipe, we will look at several analyzers and learn how to use them to improve indexing and searching quality.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.

How to do it…

Every core type field allows you to specify a custom analyzer for indexing and for searching as field parameters.

For example, if we want the name field to use a standard analyzer for indexing and a simple analyzer for searching, the mapping will be as follows:

{ "name": {
    "type": "string",
    "index_analyzer": "standard",
    "search_analyzer": "simple"
  } }

How it works…

The concept of the analyzer comes from Lucene (the core of Elasticsearch). An analyzer is a Lucene element that is composed of a tokenizer that splits text into tokens, as well as one or more token filters. These filters carry out token manipulation such as lowercasing, normalization, removing stop words, stemming, and so on.

During the indexing phase, when Elasticsearch processes a field that must be indexed, an analyzer is chosen. First, it checks whether it is defined in the index_analyzer field, then in the document, and finally, in the index.

Choosing the correct analyzer is essential to getting good results during the query phase.

Elasticsearch provides several analyzers in its standard installation. The following table shows the most common ones:

Figure 2.4 – List of the most common general-purpose analyzers

For special language purposes, Elasticsearch supports a set of analyzers aimed at analyzing text in a specific language, such as Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, CJK, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

You're reading from Elasticsearch 8.x Cookbook Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise

Table of Contents (20) Chapters

Specifying different analyzers

Getting ready

How to do it…

How it works…

See also

Authors (1)

Personalised recommendations for you