Index aliasing and simplifying your everyday work using it
When working with multiple indexes in ElasticSearch, you can sometimes lose track of them. Imagine a situation where you store logs in your indexes. Usually, the number of log messages is quite large; therefore, it is a good solution to have the data divided somehow. A quite logical division of such data is obtained by creating a single index for a single day of logs (if you are interested in an open source solution for managing logs, look at Logstash—http://logstash.net). After a while, if we keep all the indexes, we start having problems in understanding which are the newest indexes, which ones should be used, which ones are from the last month, and maybe which data belongs to which client. With the help of aliases, we can change that to work with a single name, just as we would use a single index, but instead work with multiple indexes.
An alias
What is an index alias? It's an additional name for one or more indexes that allow(s) us to query indexes with the use of that name. A single alias can have multiple indexes as well as the other way around, a single index can be a part of multiple aliases.
However, please remember that you can't use an alias that has multiple indexes for indexing or real-time GET
operations—ElasticSearch will throw an exception if you do that. We can still use an alias that links to only one single index for indexing though. This is because ElasticSearch doesn't know in which index the data should be indexed, or from which index the document should be fetched.
Creating an alias
To create an index alias, we need to run an HTTP POST
method to the _aliases
REST endpoint with an action defined. For example, the following request will create a new alias called week12
that will have indexes named day10
, day11
, and day12
:
curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "day10", "alias" : "week12" } }, { "add" : { "index" : "day11", "alias" : "week12" } }, { "add" : { "index" : "day12", "alias" : "week12" } } ] }'
If the alias week12
isn't present in our ElasticSearch cluster, the preceding command will create it. If it is present, the command will just add the specified indexes to it.
If everything goes well, instead of running a search across three indexes as follows:
curl –XGET 'http://localhost:9200/day10,day11,day12/_search?q=test'
We can run it as follows:
curl –XGET 'http://localhost:9200/week12/_search?q=test'
Isn't that better?
Modifying aliases
Of course, you can also remove indexes from an alias. Doing that is similar to how we add indexes to an alias, but instead of the add
command, we use the remove
one. For example, to remove the index named day9
from the week12
index, we would run the following command:
curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "remove" : { "index" : "day9", "alias" : "week12" } } ] }'
Combining commands
The add
and remove
commands
can be sent as a single request. For example, if you want to combine all the previously sent commands into a single request, you will have to send the following command:
curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "day10", "alias" : "week12" } }, { "add" : { "index" : "day11", "alias" : "week12" } }, { "add" : { "index" : "day12", "alias" : "week12" } }, { "remove" : { "index" : "day9", "alias" : "week12" } } ] }'
Retrieving all aliases
In addition to adding or removing indexes to or from aliases, the applications that use ElasticSearch may need to retrieve all the aliases available in the cluster or all the aliases an index is connected to. To retrieve these aliases, we send a request using an HTTP GET
command. For example, the following command gets all the aliases for the day10
index and the second one will get all the available aliases:
curl -XGET 'localhost:9200/day10/_aliases' curl -XGET 'localhost:9200/_aliases'
The response from the second command is as follows:
{ "day10" : { "aliases" : { "week12" : { } } }, "day11" : { "aliases" : { "week12" : { } } }, "day12" : { "aliases" : { "week12" : { } } } }
Filtering aliases
Aliases can be used in a similar way to how views are used in SQL databases. You can use full Query DSL (discussed in detail in the Queying ElasticSearch section in the next chapter) and have your query applied to all the count, search, delete by query, and more such operations. Let's look at an example. Imagine that we want to have aliases that return data for a certain client, so we can use it in our application. Let's say that the client identifier we are interested in is stored in the clientId
field and we are interested in client 12345
. So, let's create an alias named client
with our data index, which will apply a filter for the clientId
automatically:
curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "data", "alias" : "client", "filter" : { "term" : { "clientId" : "12345" } } } } ] }'
So, when using the preceding alias, you will always get your queries, counts, deletes by query, and more such queries filtered by a term query that ensures that all the documents have the 12345
value in the clientId
field.
Aliases and routing
Similar to the aliases that use filtering, we can add routing values to the aliases. Imagine that we are using routing on the basis of user identifier and we want to use the same routing values with our aliases. For the alias named client
, we will use the routing value of 12345,12346,12347
for indexing, and only 12345
for querying. So, we create an alias with the following command:
curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "data", "alias" : "client", "index_routing" : "12345,12346,12347" "search_routing" : "12345" } } ] }'
This way, when we index our data by using the client
alias, the values specified by the index_routing
property will be used, and during query time, the one specified by the search_routing
property will be used.
If you run the following query with the preceding alias:
curl -XGET 'http://localhost:9200/client/_search?q=test&routing=99999,12345'
The value used as a routing value will be 12345
. This is because ElasticSearch will take the common values of the
search_routing
attribute and the query routing parameter, which in our case is 12345
.