You're reading from Elastic Stack 8.x Cookbook Over 80 recipes to perform ingestion, search, visualization, and monitoring for actionable insights

Product type Paperback

Published in Jun 2024

Publisher Packt

ISBN-13 9781837634293

Length 688 pages

Edition 1st Edition

Tools

Elasticsearch

Concepts

Enterprise Search

Authors (2):

Yazid Akadiri

Huage Chen

View More author details

Table of Contents (16) Chapters

Preface

1. Chapter 1: Getting Started – Installing the Elastic Stack

2. Chapter 2: Ingesting General Content Data FREE CHAPTER

3. Chapter 3: Building Search Applications

4. Chapter 4: Timestamped Data Ingestion

5. Chapter 5: Transform Data

6. Chapter 6: Visualize and Explore Data

7. Chapter 7: Alerting and Anomaly Detection

8. Chapter 8: Advanced Data Analysis and Processing

9. Chapter 9: Vector Search and Generative AI Integration

10. Chapter 10: Elastic Observability Solution

11. Chapter 11: Managing Access Control

12. Chapter 12: Elastic Stack Operation

13. Chapter 13: Elastic Stack Monitoring

14. Index

Why subscribe?

15. Other Books You May Enjoy

Updating data in Elasticsearch

In this recipe, we will explore how to update data in Elasticsearch using the Python client.

Getting ready

Ensure that you have installed the Elasticsearch Python client and have successfully set up a connection to your Elasticsearch cluster (refer to the Adding data from the Elasticsearch client recipe). You will also need to have completed the previous recipe, which involves ingesting a document into the movies index.

Note

The following three recipes will use the same set of requirements.

How to do it…

In this recipe, we’re going to update the director field of a particular document within the movies index. The director field will be changed from its current value, D.W. Griffith, to a new value, Clint Eastwood. The following are the steps you’ll need to follow in your Python script to perform this update and confirm that it has been successfully applied. Let’s inspect the Python script that we will use to update the ingested document (https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_update.py):

First, we need to retrieve the document ID of the previously ingested document from the tmp.txt file, which we intend to update. The field to update here is director; we are going to update the value from D.W. Griffith to Clint Eastwood:
```
index_name = 'movies'
document_id = ''
# Read the document_id the ingested document of the previous recipe
with open('tmp.txt', 'r') as file:
    document_id = file.read()
document = {
    'director': 'Clint Eastwood'
}
```

We can now check document_id, verify that the document exists in the index, and then perform the update operation:

# Update the document in Elasticsearch if document_id is valid
if document_id != '':
    if es.exists(index=index_name, id=document_id):
        response = es.update(index=index_name, id=document_id,
                             doc=document)
        print(f"Update status: {response['result']}")

Once the document is updated, to verify that the update is successful, you can retrieve the updated document from Elasticsearch and print the modified fields:
```
updated_document = es.get(index=index_name, id=document_id)
print("Updated document:")
print(updated_document)
```
After inspecting the script, let’s run it with the following command:
```
$ python sampledata_update.py
```

Figure 2.6 – The output of the sampledata_update.py script

You should see that the _version and director fields are updated.

How it works...

Each document includes a _version field in Elasticsearch. Elasticsearch documents cannot be modified directly, as they are immutable. When you update an existing document, a new document is generated with an incremented version, while the previous document is flagged for deletion.

There’s more…

We have just seen how to update a single document in Elasticsearch; in general, this is not optimal from a performance point of view. To update multiple documents that match a specific query, you can use the Update By Query API. This allows you to define a query to select the documents you want to update and specify the changes to be made; here is an example of how to do it via Elasticsearch’s REST API:

q = {
    "script": {
        "source": "ctx._source.genre = 'comedies'",
        "lang": "painless"
    },
    "query": {
        "bool": {
            "must": [
              {
                "term": {
                    "genre": "comedy"
                }
              }
            ]
        }
    }
}
es.update_by_query(body=q, index=index_name)

The full Python script is available here: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_update_by_query.py.

Note

The script used here is based on a painless script; we will see more examples in Chapter 6.

The other way to update multiple documents in a single request is via Elasticsearch’s Bulk API. The Bulk API can be used to insert, update, and delete multiple documents efficiently. We will learn how to use the Bulk API to ingest multiple documents at the end of this chapter. For more detailed information, refer to the following documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html.

To retrieve the ID of the document we want to update, we rely on a tmp.txt file where the ID of a previously created document was saved. Alternatively, you can retrieve the document’s ID by using the Dev Tools in Kibana, perform a search on the movies index, go to Kibana | Dev Tools, and execute the following command:

GET movies/_search

This query should return a list of hits that display all documents in the index, along with their respective IDs, as shown in Figure 2.7. Using these results, locate and record the ID of the document you would like to update: