The master nodes that we have seen previously are the most important for cluster stability. To prevent the queries and aggregations from creating instability in your cluster, coordinator (or client/proxy) nodes can be used to provide safe communication with the cluster.
Setting up a coordinator node
Getting ready
You need a working Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in this chapter, and a simple text editor to change configuration files.
How to do it…
For the advance setup of a cluster, there are some parameters that must be configured to define different node types.
These parameters are in the config/elasticsearch.yml, file and they can be setup a coordinator node with the following steps:
- Set up the node so that it's not a master, as follows:
node.master: false
- Set up the node to not contain data, as follows:
node.data: false
How it works…
The coordinator node is a special node that works as a proxy/pass thought for the cluster. Its main advantages are as follows:
- It can easily be killed or removed from the cluster without causing any problems. It's not a master, so it doesn't participate in cluster functionalities and it doesn't contain data, so there are no data relocations/replications due to its failure.
- It prevents the instability of the cluster due to a developers' /users bad queries. Sometimes, a user executes aggregations that are too large (that is, date histograms with a range of some years and intervals of 10 seconds). Here, the Elasticsearch node could crash. (In its newest version, Elasticsearch has a structure called circuit breaker to prevent similar issues, but there are always borderline cases that can bring instability using scripting, for example. The coordinator node is not a master and its overload doesn't cause any problems for cluster stability.
- If the coordinator or client node is embedded in the application, there are less round trips for the data, speeding up the application.
- You can add them to balance the search and aggregation throughput without generating changes and data relocation in the cluster.