Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Learning Elasticsearch

You're reading from   Learning Elasticsearch Structured and unstructured data using distributed real-time search and analytics

Arrow left icon
Product type Paperback
Published in Jun 2017
Publisher Packt
ISBN-13 9781787128453
Length 404 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Abhishek Andhavarapu Abhishek Andhavarapu
Author Profile Icon Abhishek Andhavarapu
Abhishek Andhavarapu
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Introduction to Elasticsearch FREE CHAPTER 2. Setting Up Elasticsearch and Kibana 3. Modeling Your Data and Document Relations 4. Indexing and Updating Your Data 5. Organizing Your Data and Bulk Data Ingestion 6. All About Search 7. More Than a Search Engine (Geofilters, Autocomplete, and More) 8. How to Slice and Dice Your Data Using Aggregations 9. Production and Beyond 10. Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Basic concepts of Elasticsearch

Elasticsearch is a highly scalable open source search engine. Although it started as a text search engine, it is evolving as an analytical engine, which can support not only search but also complex aggregations. Its distributed nature and ease of use makes it very easy to get started and scale as you have more data.

One might ask what makes Elasticsearch different from any other document stores out there. Elasticsearch is a search engine and not just a key-value store. It's also a very powerful analytical engine; all the queries that you would usually run in a batch or offline mode can be executed in real time. Support for features such as autocomplete, geo-location based filters, multilevel aggregations, coupled with its user friendliness resulted in industry-wide acceptance. That being said, I always believe it is important to have the right tool for the right job. Towards the end of the chapter, we will discuss it’s strengths and limitations.

In this section, we will go through the basic concepts and terminology of Elasticsearch. We will start by explaining how to insert, update, and perform a search. If you are familiar with SQL language, the following table shows the equivalent terms in Elasticsearch:

Database Table Row Column
Index Type Document Field

Document

Your data in Elasticsearch is stored as JSON (Javascript Object Notation) documents. Most NoSQL data stores use JSON to store their data as JSON format is very concise, flexible, and readily understood by humans. A document in Elasticsearch is very similar to a row when compared to a relational database. Let's say we have a User table with the following information:

Id Name Age Gender Email
1 Luke 100 M luke@gmail.com
2 Leia 100 F leia@gmail.com

The users in the preceding user table, when represented in JSON format, will look like the following:

{
"id": 1,
"name": "Luke",
"age": 100,
"gender": "M",
"email": "luke@gmail.com"
} {
"id": 2,
"name": "Leia",
"age": 100,
"gender": "F",
"email": "leia@gmail.com"
}

A row contains columns; similarly, a document contains fields. Elasticsearch documents are very flexible and support storing nested objects. For example, an existing user document can be easily extended to include the address information. To capture similar information using a table structure, you need to create a new address table and manage the relations using a foreign key. The user document with the address is shown here:

{
"id": 1,
"name": "Luke",
"age": 100,
"gender": "M",
"email": "luke@gmail.com",
"address": {
"street": "123 High Lane",
"city": "Big City",
"state": "Small State",
"zip": 12345
}
}

Reading similar information without the JSON structure would also be difficult as the information would have to be read from multiple tables. Elasticsearch allows you to store the entire JSON as it is. For a database table, the schema has to be defined before you can use the table. Elasticsearch is built to handle unstructured data and can automatically determine the data types for the fields in the document. You can index new documents or add new fields without adding or changing the schema. This process is also known as dynamic mapping. We will discuss how this works and how to define schema in Chapter 3, Modeling Your Data and Document Relations.

Index

An index is similar to a database. The term index should not be confused with a database index, as someone familiar with traditional SQL might assume. Your data is stored in one or more indexes just like you would store it in one or more databases. The word indexing means inserting/updating the documents into an Elasticsearch index. The name of the index must be unique and typed in all lowercase letters. For example, in an e-commerce world, you would have an index for the items--one for orders, one for customer information, and so on.

Type

A type is similar to a database table, an index can have one or more types. Type is a logical separation of different kinds of data. For example, if you are building a blog application, you would have a type defined for articles in the blog and a type defined for comments in the blog. Let's say we have two types--articles and comments.

The following is the document that belongs to the article type:

{
"articleid": 1,
"name": "Introduction to Elasticsearch"
}

The following is the document that belongs to the comment type:

{
"commentid": "AVmKvtPwWuEuqke_aRsm",
"articleid": 1,
"comment": "Its Awesome !!"
}

We can also define relations between different types. For example, a parent/child relation can be defined between articles and comments. An article (parent) can have one or more comments (children). We will discuss relations further in Chapter 3, Modeling Your Data and Document Relations.

Cluster and node

In a traditional database system, we usually have only one server serving all the requests. Elasticsearch is a distributed system, meaning it is made up of one or more nodes (servers) that act as a single application, which enables it to scale and handle load beyond what a single server can handle. Each node (server) has part of the data. You can start running Elasticsearch with just one node and add more nodes, or, in other words, scale the cluster when you have more data. A cluster with three nodes is shown in the following diagram:

In the preceding diagram, the cluster has three nodes with the names elasticsearch1, elasticsearch2, elasticsearch3. These three nodes work together to handle all the indexing and query requests on the data. Each cluster is identified by a unique name, which defaults to elasticsearch. It is often common to have multiple clusters, one for each environment, such as staging, pre-production, production.

Just like a cluster, each node is identified by a unique name. Elasticsearch will automatically assign a unique name to each node if the name is not specified in the configuration. Depending on your application needs, you can add and remove nodes (servers) on the fly. Adding and removing nodes is seamlessly handled by Elasticsearch.

We will discuss how to set up an Elasticsearch cluster in Chapter 2, Setting Up Elasticsearch and Kibana.

Shard

An index is a collection of one or more shards. All the data that belongs to an index is distributed across multiple shards. By spreading the data that belongs to an index to multiple shards, Elasticsearch can store information beyond what a single server can store. Elasticsearch uses Apache Lucene internally to index and query the data. A shard is nothing but an Apache Lucene instance. We will discuss Apache Lucene and why Elasticsearch uses Lucene in the How search works section later.

I know we introduced a lot of new terms in this section. For now, just remember that all data that belongs to an index is spread across one or more shards. We will discuss how shards work in the Scalability and Availability section towards the end of this chapter.

You have been reading a chapter from
Learning Elasticsearch
Published in: Jun 2017
Publisher: Packt
ISBN-13: 9781787128453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime