You're reading from ElasticSearch Cookbook - Second Edition Over 130 advanced recipes to search, analyze, deploy, manage, and monitor data effectively with ElasticSearch

Product type Paperback

Published in Jan 2015

Publisher

ISBN-13 9781783554836

Length 472 pages

Edition 2nd Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1):

Alberto Paro

View More author details

Table of Contents (14) Chapters

Preface

1. Getting Started FREE CHAPTER

2. Downloading and Setting Up

3. Managing Mapping

4. Basic Operations

5. Search, Queries, and Filters

6. Aggregations

7. Scripting

8. Rivers

9. Cluster and Node Monitoring

10. Java Integration

11. Python Integration

12. Plugin Development

Index

Understanding nodes and clusters

Every instance of ElasticSearch is called a node. Several nodes are grouped in a cluster. This is the base of the cloud nature of ElasticSearch.

Getting ready

To better understand the following sections, some basic knowledge about the concepts of the application node and cluster are required.

How it works...

One or more ElasticSearch nodes can be set up on a physical or a virtual server depending on the available resources such as RAM, CPU, and disk space.

A default node allows you to store data in it to process requests and responses. (In Chapter 2, Downloading and Setting Up, we'll see details about how to set up different nodes and cluster topologies).

When a node is started, several actions take place during its startup, such as:

The configuration is read from the environment variables and the elasticsearch.yml configuration file
A node name is set by the configuration file or is chosen from a list of built-in random names
Internally, the ElasticSearch engine initializes all the modules and plugins that are available in the current installation
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

After the node startup, the node searches for other cluster members and checks its index and shard status.

To join two or more nodes in a cluster, the following rules must be observed:

The version of ElasticSearch must be the same (v0.20, v0.9, v1.4, and so on) or the join is rejected.
The cluster name must be the same.
The network must be configured to support broadcast discovery (it is configured to it by default) and they can communicate with each other. (See the Setting up networking recipe in Chapter 2, Downloading and Setting Up.)

A common approach in cluster management is to have a master node, which is the main reference for all cluster-level actions, and the other nodes, called secondary nodes, that replicate the master data and its actions.

To be consistent in the write operations, all the update actions are first committed in the master node and then replicated in the secondary nodes.

In a cluster with multiple nodes, if a master node dies, a master-eligible node is elected to be the new master node. This approach allows automatic failover to be set up in an ElasticSearch cluster.

There's more...

There are two important behaviors in an ElasticSearch node: the non-data node (or arbiter) and the data container behavior.

Non-data nodes are able to process REST responses and all other operations of search. During every action execution, ElasticSearch generally executes actions using a map/reduce approach: the non-data node is responsible for distributing the actions to the underlying shards (map) and collecting/aggregating the shard results (redux) to be able to send a final response. They may use a huge amount of RAM due to operations such as facets, aggregations, collecting hits and caching (such as scan/scroll queries).

Data nodes are able to store data in them. They contain the indices shards that store the indexed documents as Lucene (internal ElasticSearch engine) indices.

Using the standard configuration, a node is both an arbiter and a data container.

In big cluster architectures, having some nodes as simple arbiters with a lot of RAM, with no data, reduces the resources required by data nodes and improves performance in searches using the local memory cache of arbiters.

You're reading from ElasticSearch Cookbook - Second Edition Over 130 advanced recipes to search, analyze, deploy, manage, and monitor data effectively with ElasticSearch

Table of Contents (14) Chapters

Understanding nodes and clusters

Getting ready

How it works...

Tip

There's more...

See also

Authors (1)

Personalised recommendations for you