Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Elastic Stack 8.x Cookbook

You're reading from   Elastic Stack 8.x Cookbook Over 80 recipes to perform ingestion, search, visualization, and monitoring for actionable insights

Arrow left icon
Product type Paperback
Published in Jun 2024
Publisher Packt
ISBN-13 9781837634293
Length 688 pages
Edition 1st Edition
Arrow right icon
Authors (2):
Arrow left icon
Yazid Akadiri Yazid Akadiri
Author Profile Icon Yazid Akadiri
Yazid Akadiri
Huage Chen Huage Chen
Author Profile Icon Huage Chen
Huage Chen
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Getting Started – Installing the Elastic Stack 2. Chapter 2: Ingesting General Content Data FREE CHAPTER 3. Chapter 3: Building Search Applications 4. Chapter 4: Timestamped Data Ingestion 5. Chapter 5: Transform Data 6. Chapter 6: Visualize and Explore Data 7. Chapter 7: Alerting and Anomaly Detection 8. Chapter 8: Advanced Data Analysis and Processing 9. Chapter 9: Vector Search and Generative AI Integration 10. Chapter 10: Elastic Observability Solution 11. Chapter 11: Managing Access Control 12. Chapter 12: Elastic Stack Operation 13. Chapter 13: Elastic Stack Monitoring 14. Index 15. Other Books You May Enjoy

Using an analyzer

In this recipe, we are going to learn how to set up and use a specific analyzer for text analysis. Indexing data in Elasticsearch, especially for search use cases, requires that you define how text should be processed before indexation; this is what analyzers accomplish.

Analyzers in Elasticsearch handle tokenization and normalization functions. Elasticsearch offers a variety of ready-made analyzers for common scenarios, as well as language-specific analyzers for English, German, Spanish, French, Hindi, and so on.

In this recipe, we will see how to configure the standard analyzer with the English stopwords filter.

Getting ready

Make sure that you completed the Adding data from the Elasticsearch client recipe. Also, make sure to download the following sample Python script from the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_analyzer.py.

The command snippets of this recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#using-analyzer.

How to do it…

In this recipe, you will learn how to configure your Python code to interface with an Elasticsearch cluster, define a custom English text analyzer, create a new index with the analyzer, and verify that the index uses the specified settings.

Let’s look at the provided Python script:

  1. At the beginning of the script, we create an instance of the Elasticsearch client:
    es = Elasticsearch(
        cloud_id=ES_CID,
        basic_auth=(ES_USER, ES_PWD)
    )
  2. To ensure that we do not use an existing movies index, the script includes code that deletes any such index:
    if es.indices.exists(index="movies"):
        print("Deleting existing movies index...")
        es.options(ignore_status=[404, 400]).indices.delete(index="movies")
  3. Next, we define the analyzer configuration:
    index_settings = {
        "analysis": {
            "analyzer": {
                "standard_with_english_stopwords": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    }
  4. We then create the index with settings that define the analyzer:
    es.indices.create(index='movies', settings=index_settings)
  5. Finally, to verify the successful addition of the analyzer, we retrieve the settings:
    settings = es.indices.get_settings(index='movies')
    analyzer_settings = settings['movies']['settings']['index']['analysis']
    print(f"Analyzer used for the index: {analyzer_settings}")
  6. After reviewing the script, execute it with the following command, and you should see the output shown in Figure 2.10:
    $ python sampledata_analyzer.py
Figure 2.10 – The output of the sampledata_analyzer.py script

Figure 2.10 – The output of the sampledata_analyzer.py script

Alternatively, you can go to Kibana | Dev Tools and issue the following request:

GET /movies/_settings

In the response, you should see the settings currently applied to the movies index with the configured analyzer, as shown in Figure 2.11:

Figure 2.11 – The analyzer configuration in the index settings

Figure 2.11 – The analyzer configuration in the index settings

How it works...

The settings block of the index configuration is where the analyzer is set. As we are modifying the built-in standard analyzer in our recipe, we will give it a unique name (standard_with_english_stopwords) and set the type to standard. Text indexed from this point will undergo analysis by the modified analyzer. To test this, we can use the _analyze endpoint on the index:

POST movies/_analyze
{
  "text": "A young couple decides to elope.",
  "analyzer": "standard_with_stopwords"
}

It should yield the results shown in Figure 2.12:

Figure 2.12 – The index result of a text with the stopword analyzer

Figure 2.12 – The index result of a text with the stopword analyzer

There’s more…

While Elasticsearch offers many built-in analyzers for different languages and text types, you can also define custom analyzers. These allow you to specify how text is broken down and modified for indexing or searching, using components such as tokenizers, token filters, and character filters – either those provided by Elasticsearch or custom ones you create. For example, you can design an analyzer that converts text to lowercase, removes common words, substitutes synonyms, and strips accents.

Reasons for needing a custom analyzer may include the following:

  • Handling various languages and scripts that require special processing, such as Chinese, Japanese, and Arabic
  • Enhancing the relevance and comprehensiveness of search results using synonyms, stemming, lemmatization, and so on
  • Unifying text by removing punctuation, whitespace, and accents and making it case-insensitive
You have been reading a chapter from
Elastic Stack 8.x Cookbook
Published in: Jun 2024
Publisher: Packt
ISBN-13: 9781837634293
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image