Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Learning Elasticsearch

You're reading from   Learning Elasticsearch Structured and unstructured data using distributed real-time search and analytics

Arrow left icon
Product type Paperback
Published in Jun 2017
Publisher Packt
ISBN-13 9781787128453
Length 404 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Abhishek Andhavarapu Abhishek Andhavarapu
Author Profile Icon Abhishek Andhavarapu
Abhishek Andhavarapu
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Introduction to Elasticsearch FREE CHAPTER 2. Setting Up Elasticsearch and Kibana 3. Modeling Your Data and Document Relations 4. Indexing and Updating Your Data 5. Organizing Your Data and Bulk Data Ingestion 6. All About Search 7. More Than a Search Engine (Geofilters, Autocomplete, and More) 8. How to Slice and Dice Your Data Using Aggregations 9. Production and Beyond 10. Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Interacting with Elasticsearch

The primary way of interacting with Elasticsearch is via REST API. Elasticsearch provides JSON-based REST API over HTTP. By default, Elasticsearch REST API runs on port 9200. Anything from creating an index to shutting down a node is a simple REST call. The APIs are broadly classified into the following:

  • Document APIs: CRUD (Create Retrieve Update Delete) operations on documents
  • Search APIs: For all the search operations
  • Indices APIs: For managing indices (creating an index, deleting an index, and so on)
  • Cat APIs: Instead of JSON, the data is returned in tabular form
  • Cluster APIs: For managing the cluster

We have a chapter dedicated to each one of them to discuss more in detail. For example, indexing documents in Chapter 4, Indexing and Updating Your Data and search in Chapter 6, All About Search and so on. In this section, we will go through some basic CRUD using the Document APIs. This section is simply a brief introduction on how to manipulate data using Document APIs. To use Elasticsearch in your application, clients in all major languages, such as Java, Python, are also provided. The majority of the clients acts as a wrapper around the REST API.

To better explain the CRUD operations, imagine we are building an e-commerce site. And we want to use Elasticsearch to power its search functionality. We will use an index named chapter1 and store all the products in the type called product. Each product we want to index is represented by a JSON document. We will start by creating a new product document, and then we will retrieve a product by its identifier, followed by updating a product's category and deleting a product using its identifier.

Creating a document

A new document can be added using the Document API's. For the e-commerce example, to add a new product, we execute the following command. The body of the request is the product document we want to index.

PUT http://localhost:9200/chapter1/product/1
{
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}

Let's inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
DOCUMENT JSON
HTTP METHOD PUT

The document's properties, such as title, author, the category, are also known as fields, which are similar to SQL columns.

Elasticsearch will automatically create the index chapter1 and type product if they don't exist already. It will create the index with the default settings.

When we execute the preceding request, Elasticsearch responds with a JSON response, shown as follows:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 1,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": true
}

In the response, you can see that Elasticsearch created the document and the version of the document is 1. Since you are creating the document using the HTTP PUT method, you are required to specify the document identifier. If you don’t specify the identifier, Elasticsearch will respond with the following error message:

No handler found for uri [/chapter1/product/] and method [PUT]

If you don’t have a unique identifier, you can let Elasticsearch assign an identifier for you, but you should use the POST HTTP method. For example, if you are indexing log messages, you will not have a unique identifier for each log message, and you can let Elasticsearch assign the identifier for you.

In general, we use the HTTP POST method for creating an object. The HTTP PUT method can also be used for object creation, where the client provides the unique identifier instead of the server assigning the identifier.

We can index a document without specifying a unique identifier as shown here:

POST http://localhost:9200/chapter1/product/
{
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}

In the above request, URL doesn't contain the unique identifier and we are using the HTTP POST method. Let's inspect the request:

INDEX chapter1
TYPE product
DOCUMENT JSON
HTTP METHOD POST

The response from Elasticsearch is shown as follows:

{
"_index": "chapter1",
"_type": "product",
"_id": "AVmKvtPwWuEuqke_aRsm",
"_version": 1,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": true
}

You can see from the response that Elasticsearch assigned the unique identifier AVmKvtPwWuEuqke_aRsm to the document and created flag is set to true. If a document with the same unique identifier already exists, Elasticsearch replaces the existing document and increments the document version. If you have to run the same PUT request from the beginning of the section, the response from Elasticsearch would be this:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 2,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": false
}

In the response, you can see that the created flag is false since the document with id: 1 already exists. Also, observe that the version is now 2.

Retrieving an existing document

To retrieve an existing document, we need the index, type and a unique identifier of the document. Let’s try to retrieve the document we just indexed. To retrieve a document we need to use HTTP GET method as shown below:

GET http://localhost:9200/chapter1/product/1

Let’s inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
HTTP METHOD GET

Response from Elasticsearch as shown below contains the product document we indexed in the previous section:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}
}

The actual JSON document will be stored in the _source field. Also note the version in the response; every time the document is updated, the version is increased.

Updating an existing document

Updating a document in Elasticsearch is more complicated than in a traditional SQL database. Internally, Elasticsearch retrieves the old document, applies the changes, and re-inserts the document as a new document. The update operation is very expensive. There are different ways of updating a document. We will talk about updating a partial document here and in more detail in the Updating your data section in Chapter 4, Indexing and Updating Your Data.

Updating a partial document

We already indexed the document with the unique identifier 1, and now we need to update the category of the product from just books to technical books. We can update the document as shown here:

 POST http://localhost:9200/chapter1/product/1/_update
{
"doc": {
"category": "technical books"
}
}

The body of the request is the field of the document we want to update and the unique identifier is passed in the URL.

Please note the _update endpoint at the end of the URL.

The response from Elasticsearch is shown here:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 3,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}

As you can see in the response, the operation is successful, and the version of the document is now 3. More complicated update operations are possible using scripts and upserts.

Deleting an existing document

For creating and retrieving a document, we used the POST and GET methods. For deleting an existing document, we need to use the HTTP DELETE method and pass the unique identifier of the document in the URL as shown here:

DELETE http://localhost:9200/chapter1/product/1

Let's inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
HTTP METHOD DELETE

The response from Elasticsearch is shown here:

{
"found": true,
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 4,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}

In the response, you can see that Elasticsearch was able to find the document with the unique identifier 1 and was successful in deleting the document.

You have been reading a chapter from
Learning Elasticsearch
Published in: Jun 2017
Publisher: Packt
ISBN-13: 9781787128453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image