Understanding nodes and clusters
Every instance of ElasticSearch is called a node. Several nodes are grouped in a cluster. This is the base of the cloud nature of ElasticSearch.
Getting ready
To better understand the following sections, some basic knowledge about the concepts of the application node and cluster are required.
How it works...
One or more ElasticSearch nodes can be set up on a physical or a virtual server depending on the available resources such as RAM, CPU, and disk space.
A default node allows you to store data in it to process requests and responses. (In Chapter 2, Downloading and Setting Up, we'll see details about how to set up different nodes and cluster topologies).
When a node is started, several actions take place during its startup, such as:
- The configuration is read from the environment variables and the
elasticsearch.yml
configuration file - A node name is set by the configuration file or is chosen from a list of built-in random names
- Internally, the ElasticSearch engine initializes all the modules and plugins that are available in the current installation
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
After the node startup, the node searches for other cluster members and checks its index and shard status.
To join two or more nodes in a cluster, the following rules must be observed:
- The version of ElasticSearch must be the same (v0.20, v0.9, v1.4, and so on) or the join is rejected.
- The cluster name must be the same.
- The network must be configured to support broadcast discovery (it is configured to it by default) and they can communicate with each other. (See the Setting up networking recipe in Chapter 2, Downloading and Setting Up.)
A common approach in cluster management is to have a master node, which is the main reference for all cluster-level actions, and the other nodes, called secondary nodes, that replicate the master data and its actions.
To be consistent in the write operations, all the update actions are first committed in the master node and then replicated in the secondary nodes.
In a cluster with multiple nodes, if a master node dies, a master-eligible node is elected to be the new master node. This approach allows automatic failover to be set up in an ElasticSearch cluster.
There's more...
There are two important behaviors in an ElasticSearch node: the non-data node (or arbiter) and the data container behavior.
Non-data nodes are able to process REST responses and all other operations of search. During every action execution, ElasticSearch generally executes actions using a map/reduce approach: the non-data node is responsible for distributing the actions to the underlying shards (map) and collecting/aggregating the shard results (redux) to be able to send a final response. They may use a huge amount of RAM due to operations such as facets, aggregations, collecting hits and caching (such as scan/scroll queries).
Data nodes are able to store data in them. They contain the indices shards that store the indexed documents as Lucene (internal ElasticSearch engine) indices.
Using the standard configuration, a node is both an arbiter and a data container.
In big cluster architectures, having some nodes as simple arbiters with a lot of RAM, with no data, reduces the resources required by data nodes and improves performance in searches using the local memory cache of arbiters.
See also
- The Setting up different node types recipe in Chapter 2, Downloading and Setting Up.