Dynamic mappings and templates
The previous topic described how we can define type mapping if the mapping generated automatically by ElasticSearch is not sufficient. Now let's go one step back and see how automatic mapping works. Knowledge about this prevents surprises during development of your applications and let's you build more flexible software. In this second case, if sometimes our application grows and automatically generates new indexes (for example, for storing a massive number of time-based events), it is more convenient to adjust the mechanism of determining the data types. Also, if an application has many indexes, the possibility of defining the mapping templates is very handy.
Type determining mechanism
ElasticSearch can guess the document structure by looking at the JSON, which defines the document. In JSON, strings are surrounded by quotation marks, Booleans are defined using specific words and numbers are just a few digits. This is a simple trick, but it usually works. For the following document:
{ "field1": 10, "field2": "10" }
field1
will be guessed as a long type, but field2
will be determined as a string. The other numeric types are guessed similarly. Of course, this can be a desired behavior, but sometimes the data source may omit the type information and everything may be presented as strings. The solution to this is enabling more aggressive text checking in the mapping definition. For example, we may do the following during index creation:
curl -XPUT http://localhost:9200/blog/?pretty -d '{
"mappings" : {
"article": {
"numeric_detection" : true
}
}
}'
Unfortunately, this problem is also true for the Boolean type and there is no option to force guessing Boolean types from the text. In such cases, when a change of source format is impossible, we can only define the field directly in the mappings definition.
Another type that causes trouble is date. ElasticSearch tries to guess the dates given as timestamps or strings that match the date format. Fortunately, a list of recognized formats can be defined as follows:
curl -XPUT http://localhost:9200/blog/?pretty -d '{ "mappings" : { "article" : { "dynamic_date_formats" : ["yyyy-MM-dd hh:mm"] } } }
As in the previous example, the preceding command shows the mappings definition during index creation. Analogically, this works in the PUT
mapping API call of ElasticSearch. The format of the data definition is determined by the ones used in the joda-time
library (visit http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html). As you can see, this allows you to adapt to almost any format that can be used in the input document. Note that dynamic_date_format
is an array. This means that we can handle several date formats simultaneously.
Now we know how ElasticSearch guesses what is in our document. The important information is that a server can guess that for any new document. Let's check this simple case of how it can deal with changes:
curl -XPUT localhost:9200/objects/obj1/1?pretty -d '{ "field1" : 254}'
Now we have a new index called objects
with a single document in it—a document with only a single field. This is obviously a number, isn't it? So let's query ElasticSearch and retrieve the automatically generated mappings:
curl -XGET localhost:9200/objects/_mapping?pretty
And the reply is as follows:
{ "objects" : { "obj1" : { "properties" : { "field1" : { "type" : "long", "ignore_malformed" : false } } } } }
No surprise here, we got what we expected (more or less). Now let's try something different—the second document with the same field name, but another value:
curl -XPUT localhost:9200/objects/obj1/2?pretty -d '{ "field1" : "one hundred and seven" }'
And the reply is as follows:
{ "error" : "MapperParsingException[Failed to parse [field1]]; nested: NumberFormatException[For input string: \"one hundred and seven\"]; ", "status" : 400 }
It doesn't work. ElasticSearch assumes the field1
field as a number, and successive documents must fit into this assumption. To be sure, let's have one more try:
curl -XPUT localhost:9200/objects/obj1/2?pretty -d '{ "field1" : 12.2 }'
Now that we have tried to index a document with a number, but a number of a different type, it succeeded. If we query for the mappings, we will notice that the type hasn't been changed. ElasticSearch silently changed our value and truncated the fractional part. It's not good, but this can happen when the input data is not so good (it usually isn't) and this is why we sometimes want to turn off automatic mapping generation. Another reason for turning it off is a situation when we don't want to add new fields to an existing index—fields that were not known during application development. To turn off automatic field adding, we can set the dynamic
property to false
, as follows:
{ "objects" : { "obj1" : { "dynamic" : "false", "properties" : { ... } } } }
Dynamic mappings
Sometimes we want to have the possibility of different type determination dependent on situations such as the field name and type defined in JSON. This is the situation in which dynamic templates can help. Dynamic templates are similar to the usual mappings. Each template has its pattern defined, which is applied to the document's field names. If a field matches the pattern, the template is used. The pattern can be defined in a few ways:
match
: The template is used if the name of the field matches the pattern.unmatch
: The template is used if the name of the field doesn't match the pattern.
By default, the pattern is very simple and allows us to use the asterisk character. This can be changed by using match_pattern=regexp
. After using this option, we can use all the magic provided by regular expressions.
There are variations such as path_match
and path_unmatch
that can be used to match the names in nested documents.
When writing a target field definition, the following variables can be used:
{name}
: The name of the original field found in the input document{dynamic_type}
: The type determined from the original document
The last important bit of information is that ElasticSearch checks templates in order of their definitions and the first matching template is applied. This means that the most generic templates (for example, with "match": "*"
) should be defined at the end. Let's have a look at the following example:
{ "mappings" : { "article" : { "dynamic_templates" : [ { "template_test": { "match" : "*", "mapping" : { "type" : "multi_field", "fields" : { "{name}": { "type" : "{dynamic_type}"}, "str": {"type" : "string"} } } } } ] } } }
In the preceding example, we defined a mapping for the article
type. In this mapping, we have only one dynamic template named template_test
. This template is applied for every field in the input document because of the single asterisk pattern. Each field will be treated as a multi_field
, consisting of a field named as the original field (for example, title
) and the second field with the same name as the original field, suffixed with str
(for example, title.str
). The first of the created fields will have its type determined by ElasticSearch (with the {dynamic_type}
type) and the second field will be a string (because of the string
type).
Templates
As we have seen earlier in this chapter, the index configuration, and mappings in particular, can be complicated beasts. It would be very nice if there was a possibility of defining one or more mappings once and then using them in every newly created index, without the need to send them every time. ElasticSearch's creators predicted this and included a feature called index templates. Each template defines a pattern, which is compared to the newly created index name. When both match, the values defined in the template are copied to the index structure definition. When multiple templates match with the newly created index name, all of them are applied and values from the later applied templates override those defined in the previously applied templates. This is very convenient, because we can define a few common settings in the more general templates and change them into more specialized ones. Additionally, there is an order
parameter, which lets us force desired template ordering. You can think of templates as dynamic mappings, which can be applied not to the types in documents, but to the indexes.
Let's see a real example of a template. Imagine that we want to create several indexes where we don't want to store the source of the documents so that the indexes will be smaller. We also don't need any replicas. The templates can be created by calling ElasticSearch REST API and an example cURL command would be similar to the following:
curl -XPUT http://localhost:9200/_template/main_template?pretty -d ' { "template" : "*", "order" : 1, "settings" : { "index.number_of_replicas" : 0 }, "mappings" : { "_default_" : { "_source" : { "enabled" : false } } } }'
From now on, all created indexes will have no replicas and no source stored. Note the _default_
type name in our example. This is a special type name indicating that the current rule should be applied to every document type. The second interesting thing is the order
parameter. Lets define the next template with the following command:
curl -XPUT http://localhost:9200/_template/ha_template?pretty -d ' { "template" : "ha_*", "order" : 10, "settings" : { "index.number_of_replicas" : 5 } }'
All new indexes will behave as before except the ones with the names beginning with ha_
. In this case, both the templates are applied. First, the template with the lower order is used and then, the next template overwrites the replicas setting. So, these indexes will have five replicas and disabled source storage.
There is one more important thing about this example. If we try to create a document with five replicas and we have only a single node in the cluster, it will probably fail after some time and display a message similar to the following:
{ "error" : "UnavailableShardsException[[ha_blog][2] [6] shardIt, [1] active : Timeout waiting for [1m], request: index {[ha_blog][article][1], source[\n{\n \"priority\" : 1,\n \"title\" : \"Test\"\n}]}]", "status" : 503 }
This is because ElasticSearch tries to create multiple copies of each of the shards of which the index is built, but this only makes sense when each of these copies can be placed on different server instances.
Storing templates in files
Templates can also be stored in files. By default, the files should be placed in the config/templates
directory. For example, our ha_template
should be placed in the config/templates/ha_template.json
file and have the following contents:
{ "ha_template" : { "template" : "ha_*", "order" : 10, "settings" : { "index.number_of_replicas" : 5 } } }
Note that the structure of the JSON is a little bit different and has the template name as the main object key. The second important thing is that the templates must be placed in every instance of ElasticSearch. Also, the templates defined in the files are not available with the REST API calls.