Configuring ElasticSearch
One of the reasons—but of course, not the only one—that ElasticSearch is gaining more and more attention is because getting started with ElasticSearch is quite easy. Because of the reasonable default values and automatics for simple environments, we can skip the configuration and go straight to the next chapter without changing a single line in our configuration files. However, in order to truly understand ElasticSearch, it is worth understanding some of the available settings.
The whole configuration is located in the config
directory. We can see two files there: elasticsearch.yml
(or elasticsearch.json
, which will be used if present) and logging.yml
. The first file is responsible for setting the default configuration values for the server. This is important because some of these values can be changed at runtime and be kept as a part of the cluster state, so the values in this file may not be accurate. We will show you how to check the accurate configuration in Chapter 8, Dealing with Problems. The two values that we cannot change at runtime are cluster.name
and node.name
.
The cluster.name
property is responsible for holding the name of our cluster. The cluster name separates different clusters from each other. Nodes configured with the same name will try to form a cluster.
The second value is the instance name. We can leave this parameter undefined. In this case, ElasticSearch automatically chooses a unique name for itself. Note that this name is chosen during every startup, so the name can be different on each restart. Defining the name can help when referring to concrete instances by API or when using monitoring tools to see what is happening to a node during long periods of time and between restarts. If you don't provide a name, ElasticSearch will automatically choose one randomly—so you can have different names given to the same node on each restart. Think about giving descriptive names to your nodes. Other parameters are well commented in the file, so we advise you to look through it; do not worry if you do not understand the explanation. We hope that everything will become clear after reading the next few chapters.
The second file (logging.yml
) defines how much information is written to the system logs, defines the log files, and creates new files periodically. Changes in this file are necessary only when you need to adapt to monitoring or back up solutions, or during system debugging.
Let's leave the configuration files for now. An important part of configuration is tuning your operating system. During the indexing, especially when you have many shards and replicas, ElasticSearch will create several files; so the system cannot limit the open file descriptors to less than 32,000. For Linux servers, this can usually be changed in /etc/security/limits.conf
and the current value can be displayed using the ulimit
command.
The next settings are connected to the memory limit for a single instance. The default values (1024MB
) may not be sufficient. If you spot entries with OutOfMemoryError
in a log file, set the environment variable ES_HEAP_SIZE
to a value greater than 1024
. Note that this value shouldn't be set to more than 50 percent of the total physical memory available—the rest can be used as disk cache and it greatly increases the search performance.