Deleting data in Elasticsearch
In this recipe, we will explore how to delete a document from an Elasticsearch index.
Getting ready
Refer to the requirements for the Updating data in Elasticsearch recipe.
Make sure to download the following Python script from the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_delete.py.
The snippets of the recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#deleting-data-in-elasticsearch.
How to do it…
- First, let us inspect the
sampledata_delete.py
Python script. Like the process in the previous recipe, we need to retrievedocument_id
from thetmp.txt
file:with open('tmp.txt', 'r') as file: document_id = file.read()
- We can now check
document_id
, verify that the document exists in the index, and then perform thedelete
operation by using the previously obtaineddocument_id
:if document_id != '': if es.exists(index=index_name, id=document_id): # delete the document in Elasticsearch response = es.delete(index=index_name, id=document_id) print(f"delete status: {response['result']}")
- After reviewing the
delete
script, execute it with the following command:$ python sampledata_delete.py
You should see the following output:
Figure 2.8 – The output of the sampledata_delete.py script
- For further verification, return to the Dev Tools in Kibana and execute the search request again on the
movies
index:GET movies/_search
This time, the result should reflect the deletion:
Figure 2.9 – The search results in the movies index after deletion
The total hits will now be 0
, confirming that the document has been successfully deleted.
How it works...
When a document is deleted in Elasticsearch, it is not immediately removed from the index. Instead, Elasticsearch marks the document as deleted. These documents remain in the index until a merging process occurs during routine optimization tasks, when Elasticsearch physically expunges the deleted documents from the index.
This mechanism allows Elasticsearch to handle deletions efficiently. By marking documents as deleted rather than expunging them outright, Elasticsearch avoids costly segment reorganizations within the index. The removal occurs during optimized, controlled background tasks.
There’s more…
While we have discussed deleting documents by document_id
, this might not be the most efficient approach for deleting multiple documents. For such scenarios, the Delete By Query
API is more suitable, such as the following:
Note
Before executing the upcoming query, it is necessary to re-index the document, since it was deleted earlier in the recipe. Ensure that you have re-added the document to the movies
index by executing the sampledata_index.py
Python script.
POST /movies/_delete_by_query { "query": { "match": { "genre": "comedy" } } }
The preceding query will delete all movies matching the comedy
genre in our index.
Also, when deleting many documents, the best practice is to use the Delete By Query
with the slices
parameter to improve performance. The Delete by Query
feature with the slices
parameter in Elasticsearch offers considerable advantages, especially when dealing with the deletion of numerous documents. This best practice enhances performance by splitting a large deletion task into smaller, parallel operations. This method not only boosts the efficiency and reliability of the deletion process but also lessens the burden on the cluster. By dividing the task, you ensure a more balanced and effective approach to managing large-scale deletions in Elasticsearch.
See also
For more details on the Delete By Query
feature, refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html.