Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Elastic Stack 8.x Cookbook

You're reading from   Elastic Stack 8.x Cookbook Over 80 recipes to perform ingestion, search, visualization, and monitoring for actionable insights

Arrow left icon
Product type Paperback
Published in Jun 2024
Publisher Packt
ISBN-13 9781837634293
Length 688 pages
Edition 1st Edition
Arrow right icon
Authors (2):
Arrow left icon
Yazid Akadiri Yazid Akadiri
Author Profile Icon Yazid Akadiri
Yazid Akadiri
Huage Chen Huage Chen
Author Profile Icon Huage Chen
Huage Chen
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Getting Started – Installing the Elastic Stack 2. Chapter 2: Ingesting General Content Data FREE CHAPTER 3. Chapter 3: Building Search Applications 4. Chapter 4: Timestamped Data Ingestion 5. Chapter 5: Transform Data 6. Chapter 6: Visualize and Explore Data 7. Chapter 7: Alerting and Anomaly Detection 8. Chapter 8: Advanced Data Analysis and Processing 9. Chapter 9: Vector Search and Generative AI Integration 10. Chapter 10: Elastic Observability Solution 11. Chapter 11: Managing Access Control 12. Chapter 12: Elastic Stack Operation 13. Chapter 13: Elastic Stack Monitoring 14. Index 15. Other Books You May Enjoy

Deleting data in Elasticsearch

In this recipe, we will explore how to delete a document from an Elasticsearch index.

Getting ready

Refer to the requirements for the Updating data in Elasticsearch recipe.

Make sure to download the following Python script from the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_delete.py.

The snippets of the recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#deleting-data-in-elasticsearch.

How to do it…

  1. First, let us inspect the sampledata_delete.py Python script. Like the process in the previous recipe, we need to retrieve document_id from the tmp.txt file:
    with open('tmp.txt', 'r') as file:
              document_id = file.read()
  2. We can now check document_id, verify that the document exists in the index, and then perform the delete operation by using the previously obtained document_id:
    if document_id != '':
        if es.exists(index=index_name, id=document_id):
            # delete the document in Elasticsearch
            response = es.delete(index=index_name, id=document_id)
            print(f"delete status: {response['result']}")
  3. After reviewing the delete script, execute it with the following command:
    $ python sampledata_delete.py

    You should see the following output:

Figure 2.8 –  The output of the sampledata_delete.py script

Figure 2.8 – The output of the sampledata_delete.py script

  1. For further verification, return to the Dev Tools in Kibana and execute the search request again on the movies index:
    GET movies/_search

    This time, the result should reflect the deletion:

Figure 2.9 – The search results in the movies index after deletion

Figure 2.9 – The search results in the movies index after deletion

The total hits will now be 0, confirming that the document has been successfully deleted.

How it works...

When a document is deleted in Elasticsearch, it is not immediately removed from the index. Instead, Elasticsearch marks the document as deleted. These documents remain in the index until a merging process occurs during routine optimization tasks, when Elasticsearch physically expunges the deleted documents from the index.

This mechanism allows Elasticsearch to handle deletions efficiently. By marking documents as deleted rather than expunging them outright, Elasticsearch avoids costly segment reorganizations within the index. The removal occurs during optimized, controlled background tasks.

There’s more…

While we have discussed deleting documents by document_id, this might not be the most efficient approach for deleting multiple documents. For such scenarios, the Delete By Query API is more suitable, such as the following:

Note

Before executing the upcoming query, it is necessary to re-index the document, since it was deleted earlier in the recipe. Ensure that you have re-added the document to the movies index by executing the sampledata_index.py Python script.

POST /movies/_delete_by_query
{
  "query": {
    "match": {
      "genre": "comedy"
    }
  }
}

The preceding query will delete all movies matching the comedy genre in our index.

Also, when deleting many documents, the best practice is to use the Delete By Query with the slices parameter to improve performance. The Delete by Query feature with the slices parameter in Elasticsearch offers considerable advantages, especially when dealing with the deletion of numerous documents. This best practice enhances performance by splitting a large deletion task into smaller, parallel operations. This method not only boosts the efficiency and reliability of the deletion process but also lessens the burden on the cluster. By dividing the task, you ensure a more balanced and effective approach to managing large-scale deletions in Elasticsearch.

See also

For more details on the Delete By Query feature, refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html.

You have been reading a chapter from
Elastic Stack 8.x Cookbook
Published in: Jun 2024
Publisher: Packt
ISBN-13: 9781837634293
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime