Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Elasticsearch Server: Second Edition

You're reading from   Elasticsearch Server: Second Edition From creating your own index structure through to cluster monitoring and troubleshooting, this is the complete guide to implementing the ElasticSearch search engine on your own websites. Packed with real-life examples.

Arrow left icon
Product type Paperback
Published in Apr 2014
Publisher
ISBN-13 9781783980529
Length 428 pages
Edition Edition
Languages
Arrow right icon
Toc

Table of Contents (18) Chapters Close

Elasticsearch Server Second Edition
Credits
About the Author
Acknowledgments
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with the Elasticsearch Cluster 2. Indexing Your Data FREE CHAPTER 3. Searching Your Data 4. Extending Your Index Structure 5. Make Your Search Better 6. Beyond Full-text Searching 7. Elasticsearch Cluster in Detail 8. Administrating Your Cluster Index

Installing and configuring your cluster


There are a few steps required to install Elasticsearch, which we will explore in the following sections.

Installing Java

In order to set up Elasticsearch, the first step is to make sure that a Java SE environment is installed properly. Elasticsearch requires Java Version 6 or later to run. You can download it from http://www.oracle.com/technetwork/java/javase/downloads/index.html. You can also use OpenJDK (http://openjdk.java.net/) if you wish. You can, of course, use Java Version 6, but it is not supported with patches by default, so we suggest that you install Java 7.

Installing Elasticsearch

To install Elasticsearch, just download it from http://www.elasticsearch.org/download/ and unpack it. Choose the last stable version. That's it! The installation is complete.

Note

At the time of writing this book, we used Elasticsearch 1.0.0.GA. This means that we've skipped describing some properties that were marked as deprecated and are or will be removed in the future versions of Elasticsearch.

The main interface to communicate with Elasticsearch is based on an HTTP protocol and REST. This means that you can even use a web browser for some basic queries and requests, but for anything more sophisticated, you'll need to use additional software such as the cURL command. If you use the Linux or OS X command, the curl package should already be available. If you use Windows, you can download it from http://curl.haxx.se/download.html.

Installing Elasticsearch from binary packages on Linux

The other way to install Elasticsearch is to use the provided binary packages—the RPM or DEB packages, depending on your Linux distribution. The mentioned binary packages can be found at the following URL address: http://www.elasticsearch.org/download/.

Installing Elasticsearch using the RPM package

After downloading the RPM package, you just need to run the following command:

sudo yum elasticsearch-1.0.0.noarch.rpm

It is as simple as that. If everything went well, Elasticsearch should be installed and its configuration file should be stored in /etc/sysconfig/elasticsearch. If your operating system is based on Red Hat, you will be able to use the init script found at /etc/init.d/elasticsearch. If your operating system is a SUSE Linux, you can use the systemctl file found at /bin to start and stop the Elasticsearch service.

Installing Elasticsearch using the DEB package

After downloading the DEB package, all you need to do is run the following command:

sudo dpkg -i elasticsearch-1.0.0.deb

It is as simple as that. If everything went well, Elasticsearch should be installed and its configuration file should be stored in /etc/elasticsearch/elasticsearch.yml. The init script that allows you to start and stop Elasticsearch will be found at /etc/init.d/elasticsearch. Also, there will be files containing environment settings at /etc/default/elasticsearch.

The directory layout

Now, let's go to the newly created directory. We should see the following directory structure:

Directory

Description

bin

The scripts needed for running Elasticsearch instances and for plugin management

config

The directory where configuration files are located

lib

The libraries used by Elasticsearch

After Elasticsearch starts, it will create the following directories (if they don't exist):

Directory

Description

data

Where all the data used by Elasticsearch is stored

logs

The files with information about events and errors

plugins

The location for storing the installed plugins

work

The temporary files used by Elasticsearch

Configuring Elasticsearch

One of the reasons—of course, not the only one—why Elasticsearch is gaining more and more popularity is that getting started with Elasticsearch is quite easy. Because of the reasonable default values and automatic settings for simple environments, we can skip the configuration and go straight to the next chapter without changing a single line in our configuration files. However, in order to truly understand Elasticsearch, it is worth understanding some of the available settings.

We will now explore the default directories and layout of the files provided with the Elasticsearch tar.gz archive. The whole configuration is located in the config directory. We can see two files there: elasticsearch.yml (or elasticsearch.json, which will be used if present) and logging.yml. The first file is responsible for setting the default configuration values for the server. This is important because some of these values can be changed at runtime and can be kept as a part of the cluster state, so the values in this file may not be accurate. The two values that we cannot change at runtime are cluster.name and node.name.

The cluster.name property is responsible for holding the name of our cluster. The cluster name separates different clusters from each other. Nodes configured with the same cluster name will try to form a cluster.

The second value is the instance (the node) name. We can leave this parameter undefined. In this case, Elasticsearch automatically chooses a unique name for itself. Note that this name is chosen during every startup, so the name can be different on each restart. Defining the name can help when referring to concrete instances by the API or when using monitoring tools to see what is happening to a node during long periods of time and between restarts. Think about giving descriptive names to your nodes.

Other parameters are well commented in the file, so we advise you to look through it; don't worry if you do not understand the explanation. We hope that everything will become clear after reading the next few chapters.

Note

Remember that most of the parameters that have been set in the elasticsearch.yml file can be overwritten with the use of Elasticsearch REST API. We will talk about this API in the The update settings API section of Chapter 8, Administrating Your Cluster.

The second file (logging.yml) defines how much information is written to system logs, defines the logfiles, and creates new files periodically. Changes in this file are usually required only when you need to adapt to monitoring or backup solutions or during system debugging; however, if you want to have a more detailed logging, you need to adjust it accordingly.

Let's leave the configuration files for now. An important part of the configuration is tuning your operating system. During the indexing, especially when having many shards and replicas, Elasticsearch will create many files; so, the system cannot limit the open file descriptors to less than 32,000. For Linux servers, this can be usually changed in /etc/security/limits.conf and the current value can be displayed using the ulimit command. If you end up reaching the limit, Elasticsearch will not be able to create new files; so, merging will fail, indexing may fail, and new indices will not be created.

The next set of settings is connected to the Java Virtual Machine (JVM) heap memory limit for a single Elasticsearch instance. For small deployments, the default memory limit (1024 MB) will be sufficient, but for large ones, it will not be enough. If you spot entries that indicate the OutOfMemoryError exceptions in a logfile, set the ES_HEAP_SIZE variable to a value greater than 1024. When choosing the right amount of memory size to be given to the JVM, remember that, in general, no more than 50 percent of your total system memory should be given. However, as with all the rules, there are exceptions. We will discuss this in greater detail later, but you should always monitor your JVM heap usage and adjust it when needed.

Running Elasticsearch

Let's run our first instance that we just downloaded as the ZIP archive and unpacked. Go to the bin directory and run the following commands depending on the OS:

  • Linux or OS X: ./elasticsearch

  • Windows: elasticsearch.bat

Congratulations! Now, we have our Elasticsearch instance up and running. During its work, the server usually uses two port numbers: the first one for communication with the REST API using the HTTP protocol, and the second one for the transport module used for communication in a cluster and in between the native Java client and the cluster. The default port used for the HTTP API is 9200, so we can check the search readiness by pointing the web browser to http://127.0.0.1:9200/. The browser should show a code snippet similar to the following:

{
  "status" : 200,
  "name" : "es_server",
  "version" : {
    "number" : "1.0.0",
    "build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
    "build_timestamp" : "2014-02-12T16:18:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}

The output is structured as a JSON (JavaScript Object Notation) object. If you are not familiar with JSON, please take a minute and read the article available at http://en.wikipedia.org/wiki/JSON.

Note

Elasticsearch is smart. If the default port is not available, the engine binds to the next free port. You can find information about this on the console during booting as follows:

[2013-11-16 11:56:12,101][INFO ][http] [Red Lotus] bound_address {inet[/0:0:0:0:0:0:0:0%0:9200]}, publish_address {inet[/192.168.1.101:9200]}

Note the fragment with [http]. Elasticsearch uses a few ports for various tasks. The interface that we are using is handled by the HTTP module.

Now, we will use the cURL program. For example, to check cluster health, we will use the following command:

curl -XGET http://127.0.0.1:9200/_cluster/health?pretty

The -X parameter is a request method. The default value is GET (so, in this example, we can omit this parameter). Temporarily, do not worry about the GET value; we will describe it in more detail later in this chapter.

As a standard, the API returns information in a JSON object in which new line characters are omitted. The pretty parameter added to our requests forces Elasticsearch to add a new line character to the response, making the response more human friendly. You can try running the preceding query with and without the ?pretty parameter to see the difference.

Elasticsearch is useful in small- and medium-sized applications, but it has been built with large clusters in mind. So, now we will set up our big, two-node cluster. Unpack the Elasticsearch archive in a different directory and run the second instance. If we look at the log, we see what is shown as follows:

[2013-11-16 11:55:16,767][INFO ][cluster.service          ]
[Stane, Obadiah] detected_master [Martha Johansson]
[vswsFRWTSjOa_fy7uPuOMA]
[inet[/192.168.1.19:9300]], added {[Martha Johansson]
[vswsFRWTSjOa_fy7uPuOMA]
[inet[/192.168.1.19:9300]],}, reason: zen-disco-receive(from master
[[Martha Johansson][vswsFRWTSjOa_fy7uPuOMA]
[inet[/192.168.1.19:9300]]]) 

This means that our second instance (named Stane,Obadiah) discovered the previously running instance (named Martha Johansson). Here, Elasticsearch automatically formed a new, two-node cluster.

Note

Note that on some systems, the firewall software may be enabled by default, which may result in the nodes not being able to discover themselves.

Shutting down Elasticsearch

Even though we expect our cluster (or node) to run flawlessly for a lifetime, we may need to restart it or shut it down properly (for example, for maintenance). The following are three ways in which we can shut down Elasticsearch:

  • If your node is attached to the console, just press Ctrl + C

  • The second option is to kill the server process by sending the TERM signal (see the kill command on the Linux boxes and Program Manager on Windows)

  • The third method is to use a REST API

We will focus on the last method now. It allows us to shut down the whole cluster by executing the following command:

curl -XPOST http://localhost:9200/_cluster/nodes/_shutdown

To shut down just a single node, for example, a node with the BlrmMvBdSKiCeYGsiHijdg identifier, we will execute the following command:

curl –XPOST http://localhost:9200/_cluster/nodes/BlrmMvBdSKiCeYGsiHijdg/_shutdown

The identifier of the node can be read either from the logs or using the _cluster/nodes API, with the following command:

curl -XGET http://localhost:9200/_cluster/nodes/

Running Elasticsearch as a system service

Elasticsearch 1.0 can run as a service both on Linux-based systems as well as on Windows-based ones.

Elasticsearch as a system service on Linux

If you have installed Elasticsearch from the provided binary packages, you are already good to go and don't have to worry about anything. However, if you have just downloaded the archive and unpacked Elasticsearch to the directories of your choice, you'll need to put some additional effort. To install Elasticsearch as a Linux system service, we will use the Elasticsearch service wrapper that can be downloaded from https://github.com/elasticsearch/elasticsearch-servicewrapper.

Let's look at the steps to use the Elasticsearch service wrapper in order to set up a Linux service for Elasticsearch. First, we will run the following command to download the wrapper:

curl -L http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz

Assuming that Elasticsearch has been installed in /usr/local/share/elasticsearch, we will run the following command to move the needed service wrapper files:

sudo mv *servicewrapper*/service /usr/local/share/elasticsearch/bin/

We will remove the remaining wrapper files by running the following command:

rm -Rf *servicewrapper*

Finally, we will install the service by running the install command as follows:

sudo /usr/local/share/elasticsearch/bin/service/elasticsearch install

After this, we need to create a symbolic link to the /usr/local/share/elasticsearch/bin/service/elasticsearch script in /usr/local/bin/rcelasticsearch. We do this by running the following command:

sudo ln -s 'readlink -f /usr/local/share/elasticsearch/bin/service/elasticsearch' /usr/local/bin/rcelasticsearch

And that's all. If you want to start Elasticsearch, just run the following command:

/etc/init.d/elasticsearch start

Elasticsearch as a system service on Windows

Installing Elasticsearch as a system service on Windows is very easy. You just need to go to your Elasticsearch installation directory, then go to the bin subdirectory, and run the following command:

service.bat install

You'll be asked about the permission to do so. If you allow the script to run, Elasticsearch will be installed as a Windows service.

If you would like to see all the commands exposed by the service.bat script file, just run the following command in the same directory as earlier:

service.bat

For example, to start Elasticsearch, we will just run the following command:

service.bat start
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £16.99/month. Cancel anytime