You're reading from Elasticsearch Essentials Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide

Product type Paperback

Published in Jan 2016

Publisher

ISBN-13 9781784391010

Length 240 pages

Edition 1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Table of Contents (12) Chapters

Preface

1. Getting Started with Elasticsearch FREE CHAPTER

2. Understanding Document Analysis and Creating Mappings

3. Putting Elasticsearch into Action

4. Aggregations for Analytics

5. Data Looks Better on Maps: Master Geo-Spatiality

6. Document Relationships in NoSQL World

7. Different Methods of Search and Bulk Operations

8. Controlling Relevancy

9. Cluster Scaling in Production Deployments

10. Backups and Security

Index

Installing and configuring Elasticsearch

I have used the Elasticsearch Version 2.0.0 in this book; you can choose to install other versions, if you wish to. You just need to replace the version number with 2.0.0. You need to have an administrative account to perform the installations and configurations.

Installing Elasticsearch on Ubuntu through Debian package

Let's get started with installing Elasticsearch on Ubuntu Linux. The steps will be the same for all Ubuntu versions:

Download the Elasticsearch Version 2.0.0 Debian package:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.deb

Install Elasticsearch, as follows:
```
sudo dpkg -i elasticsearch-2.0.0.deb
```
To run Elasticsearch as a service (to ensure Elasticsearch starts automatically when the system is booted), do the following:
```
sudo update-rc.d elasticsearch defaults 95 10
```

Installing Elasticsearch on Centos through the RPM package

Follow these steps to install Elasticsearch on Centos machines. If you are using any other Red Hat Linux distribution, you can use the same commands, as follows:

Download the Elasticsearch Version 2.0.0 RPM package:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.0.0.rpm

Install Elasticsearch, using this command:
```
sudo rpm -i elasticsearch-2.0.0.rpm
```
To run Elasticsearch as a service (to ensure Elasticsearch starts automatically when the system is booted), use the following:
```
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
```

Understanding the Elasticsearch installation directory layout

The following table shows the directory layout of Elasticsearch that is created after installation. These directories, have some minor differences in paths depending upon the Linux distribution you are using.

Description	Path on Debian/Ubuntu	Path on RHEL/Centos
Elasticsearch home directory	`/usr/share/elasticsearch`	`/usr/share/elasticsearch`
Elasticsearch and Lucene jar files	`/usr/share/elasticsearch/lib`	`/usr/share/elasticsearch/lib`
Contains plugins	`/usr/share/elasticsearch/plugins`	`/usr/share/elasticsearch/plugins`
The locations of the binary scripts that are used to start an ES node and download plugins	`usr/share/elasticsearch/bin`	`usr/share/elasticsearch/bin`
Contains the Elasticsearch configuration files: (`elasticsearch.yml` and `logging.yml`)	`/etc/elasticsearch`	`/etc/elasticsearch`
Contains the data files of the index/shard allocated on that node	`/var/lib/elasticsearch/data`	`/var/lib/elasticsearch/data`
The startup script for Elasticsearch (contains environment variables including HEAP SIZE and file descriptors)	`/etc/init.d/elasticsearch`	`/etc/sysconfig/elasticsearch` Or `/etc/init.d/elasticsearch`
Contains the log files of Elasticsearch.	`/var/log/elasticsearch/`	`/var/log/elasticsearch/`

During installation, a user and a group with the elasticsearch name are created by default. Elasticsearch does not get started automatically just after installation. It is prevented from an automatic startup to avoid a connection to an already running node with the same cluster name.

Tip

It is recommended to change the cluster name before starting Elasticsearch for the first time.

Configuring basic parameters

Open the elasticsearch.yml file, which contains most of the Elasticsearch configuration options:
```
sudo vim /etc/elasticsearch/elasticsearch.yml
```
Now, edit the following ones:
- cluster.name: The name of your cluster
- node.name: The name of the node
- path.data: The path where the data for the ES will be stored
Note
Similar to path.data, we can change path.logs and path.plugins as well. Make sure all these parameters values are inside double quotes.
After saving the elasticsearch.yml file, start Elasticsearch:
```
sudo service elasticsearch start
```
Elasticsearch will start on two ports, as follows:
- 9200: This is used to create HTTP connections
- 9300: This is used to create a TCP connection through a JAVA client and the node's interconnection inside a cluster
  Tip
  Do not forget to uncomment the lines you have edited. Please note that if you are using a new data path instead of the default one, then you first need to change the owner and the group of that data path to the user, elasticsearch.
  The command to change the directory ownership to elasticsearch is as follows:
  sudo chown –R elasticsearch:elasticsearch data_directory_path
Run the following command to check whether Elasticsearch has been started properly:
```
sudo service elasticsearch status
```
If the output of the preceding command is shown as elasticsearch is not running, then there must be some configuration issue. You can open the log file and see what is causing the error.

The list of possible issues that might prevent Elasticsearch from starting is:

A Java issue, as discussed previously
Indention issues in the elasticsearch.yml file
At least 1 GB of RAM is not free to be used by Elasticsearch
The ownership of the data directory path is not changed to elasticsearch
Something is already running on port 9200 or 9300

Adding another node to the cluster

Adding another node in a cluster is very simple. You just need to follow all the steps for installation on another system to install a new instance of Elasticsearch. However, keep the following in mind:

In the elasticsearch.yml file, cluster.name is set to be the same on both the nodes
Both the systems should be reachable from each other over the network.
There is no firewall rule set for Elasticsearch port blocking
The Elasticsearch and JAVA versions are the same on both the nodes

You can optionally set the network.host parameter to the IP address of the system to which you want Elasticsearch to be bound and the other nodes to communicate.

Installing Elasticsearch plugins

Plugins provide extra functionalities in a customized manner. They can be used to query, monitor, and manage tasks. Thanks to the wide Elasticsearch community, there are several easy-to-use plugins available. In this book, I will be discussing some of them.

The Elasticsearch plugins come in two flavors:

Site plugins: These are the plugins that have a site (web app) in them and do not contain any Java-related content. After installation, they are moved to the site directory and can be accessed using es_ip:port/_plugin/plugin_name.
Java plugins: These mainly contain .jar files and are used to extend the functionalities of Elasticsearch. For example, the Carrot2 plugin that is used for text-clustering purposes.

Elasticsearch ships with a plugin script that is located in the /user/share/elasticsearch/bin directory, and any plugin can be installed using this script in the following format:

bin/plugin --install plugin_url

Tip

Once the plugin is installed, you need to restart that node to make it active. In the following image, you can see the different plugins installed inside the Elasticsearch node. Plugins need to be installed separately on each node of the cluster.

The following is the layout of the plugin directory of Elasticsearch:

Checking for installed plugins

You can check the log of your node that shows the following line at start up time:

[2015-09-06 14:16:02,606][INFO ][plugins                  ] [Matt Murdock] loaded [clustering-carrot2, marvel], sites [marvel, carrot2, head]

Alternatively, you can use the following command:

curl XGET 'localhost:9200/_nodes/plugins'?pretty

Another option is to use the following URL in your browser:

http://localhost:9200/_nodes/plugins

Installing the Head plugin for Elasticsearch

The Head plugin is a web front for the Elasticsearch cluster that is very easy to use. This plugin offers various features such as showing the graphical representations of shards, the cluster state, easy query creations, and downloading query-based data in the CSV format.

The following is the command to install the Head plugin:

sudo /usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Restart the Elasticsearch node with the following command to load the plugin:

sudo service elasticsearch restart

Once Elasticsearch is restarted, open the browser and type the following URL to access it through the Head plugin:

http://localhost:9200/_plugin/head

Note

More information about the Head plugin can be found here: https://github.com/mobz/elasticsearch-head

Installing Sense for Elasticsearch

Sense is an awesome tool to query Elasticsearch. You can add it to your latest version of Chrome, Safari, or Firefox browsers as an extension.

Now, when Elasticsearch is installed and running in your system, and you have also installed the plugins, you are good to go with creating your first index and performing some basic operations.