Installing and configuring your cluster
Installing and running Elasticsearch even in production environments is very easy nowadays, compared to how it was in the days of Elasticsearch 0.20.x. From a system that is not ready to one with Elasticsearch, there are only a few steps that one needs to go. We will explore these steps in the following section:
Installing Java
Elasticsearch is a Java application and to use it we need to make sure that the Java SE environment is installed properly. Elasticsearch requires Java Version 7 or later to run. You can download it from http://www.oracle.com/technetwork/java/javase/downloads/index.html. You can also use OpenJDK (http://openjdk.java.net/) if you wish. You can, of course, use Java Version 7, but it is not supported by Oracle anymore, at least without commercial support. For example, you can't expect new, patched versions of Java 7 to be released. Because of this, we strongly suggest that you install Java 8, especially given that Java 9 seems to be right around the corner with the general availability planned to be released in September 2016.
Installing Elasticsearch
To install Elasticsearch you just need to go to https://www.elastic.co/downloads/elasticsearch, choose the last stable version of Elasticsearch, download it, and unpack it. That's it! The installation is complete.
Note
At the time of writing, we used a snapshot of Elasticsearch 2.2. This means that we've skipped describing some properties that were marked as deprecated and are or will be removed in the future versions of Elasticsearch.
The main interface to communicate with Elasticsearch is based on the HTTP protocol and REST. This means that you can even use a web browser for some basic queries and requests, but for anything more sophisticated you'll need to use additional software, such as the
cURL
command. If you use the Linux or OS X command, the cURL
package should already be available. If you use Windows, you can download the package from http://curl.haxx.se/download.html.
Running Elasticsearch
Let's run our first instance that we just downloaded as the ZIP archive and unpacked. Go to the bin
directory and run the following commands depending on the OS:
- Linux or OS X:
./elasticsearch
- Windows:
elasticsearch.bat
Congratulations! Now, you have your Elasticsearch instance up-and-running. During its work, the server usually uses two port numbers: the first one for communication with the REST API using the HTTP protocol, and the second one for the transport module used for communication in a cluster and between the native Java client and the cluster. The default port used for the HTTP API is 9200, so we can check search readiness by pointing the web browser to http://127.0.0.1:9200/
. The browser should show a code snippet similar to the following:
{ "name" : "Blob", "cluster_name" : "elasticsearch", "version" : { "number" : "2.2.0", "build_hash" : "5b1dd1cf5a1957682d84228a569e124fedf8e325", "build_timestamp" : "2016-01-13T18:12:26Z", "build_snapshot" : true, "lucene_version" : "5.4.0" }, "tagline" : "You Know, for Search" }
The output is structured as a JavaScript Object Notation (JSON) object. If you are not familiar with JSON, please take a minute and read the article available at https://en.wikipedia.org/wiki/JSON.
Note
Elasticsearch is smart. If the default port is not available, the engine binds to the next free port. You can find information about this on the console during booting as follows:
[2016-01-13 20:04:49,953][INFO ][http] [Blob] publish_address {127.0.0.1:9201}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9201}
Note the fragment with [http]
. Elasticsearch uses a few ports for various tasks. The interface that we are using is handled by the HTTP module.
Now, we will use the cURL program to communicate with Elasticsearch. For example, to check the cluster health, we will use the following command:
curl -XGET http://127.0.0.1:9200/_cluster/health?pretty
The -X
parameter is a definition of the HTTP request method. The default value is GET
(so in this example, we can omit this parameter). For now, do not worry about the GET
value; we will describe it in more detail later in this chapter.
As a standard, the API returns information in a JSON object in which new line characters are omitted. The pretty
parameter added to our requests forces Elasticsearch to add a new line character to the response, making the response more user-friendly. You can try running the preceding query with and without the ?pretty
parameter to see the difference.
Elasticsearch is useful in small and medium-sized applications, but it has been built with large clusters in mind. So, now we will set up our big two-node cluster. Unpack the Elasticsearch archive in a different directory and run the second instance. If we look at the log, we will see the following:
[2016-01-13 20:07:58,561][INFO ][cluster.service ] [Big Man] detected_master {Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300}, added {{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},}, reason: zen-disco-receive(from master [{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300}])
This means that our second instance (named Big Man) discovered the previously running instance (named Blob). Here, Elasticsearch automatically formed a new two-node cluster. Starting from Elasticsearch 2.0, this will only work with nodes running on the same physical machine—because Elasticsearch 2.0 no longer supports multicast. To allow your cluster to form, you need to inform Elasticsearch about the nodes that should be contacted initially using the discovery.zen.ping.unicast.hosts
array in elasticsearch.yml
. For example, like this:
discovery.zen.ping.unicast.hosts: ["192.168.2.1", "192.168.2.2"]
Shutting down Elasticsearch
Even though we expect our cluster (or node) to run flawlessly for a lifetime, we may need to restart it or shut it down properly (for example, for maintenance). The following are the two ways in which we can shut down Elasticsearch:
- If your node is attached to the console, just press Ctrl + C
- The second option is to kill the server process by sending the TERM signal (see the
kill
command on the Linux boxes and Program Manager on Windows)Note
The previous versions of Elasticsearch exposed a dedicated shutdown API but, in 2.0, this option has been removed because of security reasons.
The directory layout
Now, let's go to the newly created directory. We should see the following directory structure:
Directory | Description |
---|---|
| The scripts needed to run Elasticsearch instances and for plugin management |
| The directory where configuration files are located |
| The libraries used by Elasticsearch |
| The plugins bundled with Elasticsearch |
After Elasticsearch starts, it will create the following directories (if they don't exist):
Directory | Description |
---|---|
| The directory used by Elasticsearch to store all the data |
| The files with information about events and errors |
| The location to store the installed plugins |
| The temporary files used by Elasticsearch |
Configuring Elasticsearch
One of the reasons—of course, not the only one—why Elasticsearch is gaining more and more popularity is that getting started with Elasticsearch is quite easy. Because of the reasonable default values and automatic settings for simple environments, we can skip the configuration and go straight to indexing and querying (or to the next chapter of the book). We can do all this without changing a single line in our configuration files. However, in order to truly understand Elasticsearch, it is worth understanding some of the available settings.
We will now explore the default directories and the layout of the files provided with the Elasticsearch tar.gz
archive. The entire configuration is located in the config
directory. We can see two files here: elasticsearch.yml
(or elasticsearch.json
, which will be used if present) and logging.yml
. The first file is responsible for setting the default configuration values for the server. This is important because some of these values can be changed at runtime and can be kept as a part of the cluster state, so the values in this file may not be accurate. The two values that we cannot change at runtime are cluster.name
and node.name
.
The cluster.name
property is responsible for holding the name of our cluster. The cluster name separates different clusters from each other. Nodes configured with the same cluster name will try to form a cluster.
The second value is the instance (the node.name
property) name. We can leave this parameter undefined. In this case, Elasticsearch automatically chooses a unique name for itself. Note that this name is chosen during each startup, so the name can be different on each restart. Defining the name can helpful when referring to concrete instances by the API or when using monitoring tools to see what is happening to a node during long periods of time and between restarts. Think about giving descriptive names to your nodes.
Other parameters are commented well in the file, so we advise you to look through it; don't worry if you do not understand the explanation. We hope that everything will become clearer after reading the next few chapters.
Note
Remember that most of the parameters that have been set in the elasticsearch.yml
file can be overwritten with the use of the Elasticsearch REST API. We will talk about this API in The update settings API section of Chapter 9, Elasticsearch Cluster in Detail.
The second file (logging.yml
) defines how much information is written to system logs, defines the log files, and creates new files periodically. Changes in this file are usually required only when you need to adapt to monitoring or backup solutions or during system debugging; however, if you want to have a more detailed logging, you need to adjust it accordingly.
Let's leave the configuration files for now and look at the base for all the applications—the operating system. Tuning your operating system is one of the key points to ensure that your Elasticsearch instance will work well. During indexing, especially when having many shards and replicas, Elasticsearch will create many files; so, the system cannot limit the open file descriptors to less than 32,000. For Linux servers, this can usually be changed in /etc/security/limits.conf
and the current value can be displayed using the ulimit
command. If you end up reaching the limit, Elasticsearch will not be able to create new files; so merging will fail, indexing may fail, and new indices will not be created.
Note
On Microsoft Windows platforms, the default limit is more than 16 million handles per process, which should be more than enough. You can read more about file handles on the Microsoft Windows platform at https://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/.
The next set of settings is connected to the Java Virtual Machine (JVM) heap memory limit for a single Elasticsearch instance. For small deployments, the default memory limit (1,024 MB) will be sufficient, but for large ones it will not be enough. If you spot entries that indicate OutOfMemoryError
exceptions in a log file, set the ES_HEAP_SIZE
variable to a value greater than 1024. When choosing the right amount of memory size to be given to the JVM, remember that, in general, no more than 50 percent of your total system memory should be given. However, as with all the rules, there are exceptions. We will discuss this in greater detail later, but you should always monitor your JVM heap usage and adjust it when needed.
The system-specific installation and configuration
Although downloading an archive with Elasticsearch and unpacking it works and is convenient for testing, there are dedicated methods for Linux operating systems that give you several advantages when you do production deployment. In production deployments, the Elasticsearch service should be run automatically with a system boot; we should have dedicated start and stop scripts, unified paths, and so on. Elasticsearch supports installation packages for various Linux distributions that we can use. Let's see how this works.
Installing Elasticsearch on Linux
The other way to install Elasticsearch on a Linux operating system is to use packages such as RPM or DEB, depending on your Linux distribution and the supported package type. This way we can automatically adapt to system directory layout; for example, configuration and logs will go into their standard places in the /etc/ or /var/log
directories. But this is not the only thing. When using packages, Elasticsearch will also install startup scripts and make our life easier. What's more, we will be able to upgrade Elasticsearch easily by running a single command from the command line. Of course, the mentioned packages can be found at the same URL address as we mentioned previously when we talked about installing Elasticsearch from zip
or tar.gz
packages: https://www.elastic.co/downloads/elasticsearch. Elasticsearch can also be installed from remote repositories via standard distribution tools such as apt-get
or yum
.
Note
Before installing Elasticsearch, make sure that you have a proper version of Java Virtual Machine installed.
Installing Elasticsearch using RPM packages
When using a Linux distribution that supports RPM packages such as Fedora Linux, (https://getfedora.org/) Elasticsearch installation is very easy. After downloading the RPM package, we just need to run the following command as root:
yum elasticsearch-2.2.0.noarch.rpm
Alternatively, you can add the remote repository and install Elasticsearch from it (this command needs to be run as root as well):
rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
This command adds the GPG key and allows the system to verify that the fetched package really comes from Elasticsearch developers. In the second step, we need to create the repository definition in the /etc/yum.repos.d/elasticsearch.repo
file. We need to add the following entries to this file:
[elasticsearch-2.2] name=Elasticsearch repository for 2.2.x packages baseurl=http://packages.elastic.co/elasticsearch/2.x/centos gpgcheck=1 gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch enabled=1
Now it's time to install the Elasticsearch server, which is as simple as running the following command (again, don't forget to run it as root):
yum install elasticsearch
Elasticsearch will be automatically downloaded, verified, and installed.
Installing Elasticsearch using the DEB package
When using a Linux distribution that supports DEB packages (such as Debian), installing Elasticsearch is again very easy. After downloading the DEB package, all you need to do is run the following command:
sudo dpkg -i elasticsearch-2.2.0.deb
It is as simple as that. Another way, which is similar to what we did with RPM packages, is by creating a new packages source and installing Elasticsearch from the remote repository. The first step is to add the public GPG key used for package verification. We can do that using the following command:
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
The second step is by adding the DEB package location. We need to add the following line to the /etc/apt/sources.list
file:
deb http://packages.elastic.co/elasticsearch/2.2/debian stable main
This defines the source for the Elasticsearch packages. The last step is updating the list of remote packages and installing Elasticsearch using the following command:
sudo apt-get update && sudo apt-get install elasticsearch
Elasticsearch configuration file localization
When using packages to install Elasticsearch, the configuration files are in slightly different directories than the default conf
directory. After the installation, the configuration files should be stored in the following location:
/etc/sysconfig/elasticsearch
or/etc/default/elasticsearch
: A file with the configuration of the Elasticsearch process as a user to run as, directories for logs, data and memory settings/etc/elasticsearch/
: A directory for the Elasticsearch configuration files, such as theelasticsearch.yml
file
Configuring Elasticsearch as a system service on Linux
If everything goes well, you can run Elasticsearch using the following command:
/bin/systemctl start elasticsearch.service
If you want Elasticsearch to start automatically every time the operating system starts, you can set up Elasticsearch as a system service by running the following command:
/bin/systemctl enable elasticsearch.service
Elasticsearch as a system service on Windows
Installing Elasticsearch as a system service on Windows is also very easy. You just need to go to your Elasticsearch installation directory, then go to the bin
subdirectory, and run the following command:
service.bat install
You'll be asked for permission to do so. If you allow the script to run, Elasticsearch will be installed as a Windows service.
If you would like to see all the commands exposed by the service.bat
script file, just run the following command in the same directory as earlier:
service.bat
For example, to start Elasticsearch, we will just run the following command:
service.bat start