Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
ElasticSearch Server

You're reading from   ElasticSearch Server Whether you're experienced in search servers or a newcomer, this book empowers you to get to grips with the speed and flexibility of ElasticSearch. A reader-friendly approach, including lots of hands-on examples, makes learning a pleasure.

Arrow left icon
Product type Paperback
Published in Feb 2013
Publisher Packt
ISBN-13 9781849518444
Length 318 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (17) Chapters Close

ElasticSearch Server
Credits
About the Authors
Acknowledgement
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with ElasticSearch Cluster FREE CHAPTER 2. Searching Your Data 3. Extending Your Structure and Search 4. Make Your Search Better 5. Combining Indexing, Analysis, and Search 6. Beyond Searching 7. Administrating Your Cluster 8. Dealing with Problems Index

Dynamic mappings and templates


The previous topic described how we can define type mapping if the mapping generated automatically by ElasticSearch is not sufficient. Now let's go one step back and see how automatic mapping works. Knowledge about this prevents surprises during development of your applications and let's you build more flexible software. In this second case, if sometimes our application grows and automatically generates new indexes (for example, for storing a massive number of time-based events), it is more convenient to adjust the mechanism of determining the data types. Also, if an application has many indexes, the possibility of defining the mapping templates is very handy.

Type determining mechanism

ElasticSearch can guess the document structure by looking at the JSON, which defines the document. In JSON, strings are surrounded by quotation marks, Booleans are defined using specific words and numbers are just a few digits. This is a simple trick, but it usually works. For the following document:

{
  "field1": 10,
  "field2": "10"
}

field1 will be guessed as a long type, but field2 will be determined as a string. The other numeric types are guessed similarly. Of course, this can be a desired behavior, but sometimes the data source may omit the type information and everything may be presented as strings. The solution to this is enabling more aggressive text checking in the mapping definition. For example, we may do the following during index creation:

curl -XPUT http://localhost:9200/blog/?pretty -d '{
  "mappings" : {
    "article": {
      "numeric_detection" : true
    }
  }
}'

Unfortunately, this problem is also true for the Boolean type and there is no option to force guessing Boolean types from the text. In such cases, when a change of source format is impossible, we can only define the field directly in the mappings definition.

Another type that causes trouble is date. ElasticSearch tries to guess the dates given as timestamps or strings that match the date format. Fortunately, a list of recognized formats can be defined as follows:

curl -XPUT http://localhost:9200/blog/?pretty -d '{
  "mappings" : {
    "article" : {
      "dynamic_date_formats" : ["yyyy-MM-dd hh:mm"]
    }
  }
}

As in the previous example, the preceding command shows the mappings definition during index creation. Analogically, this works in the PUT mapping API call of ElasticSearch. The format of the data definition is determined by the ones used in the joda-time library (visit http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html). As you can see, this allows you to adapt to almost any format that can be used in the input document. Note that dynamic_date_format is an array. This means that we can handle several date formats simultaneously.

Now we know how ElasticSearch guesses what is in our document. The important information is that a server can guess that for any new document. Let's check this simple case of how it can deal with changes:

curl -XPUT localhost:9200/objects/obj1/1?pretty -d '{ "field1" : 254}'

Now we have a new index called objects with a single document in it—a document with only a single field. This is obviously a number, isn't it? So let's query ElasticSearch and retrieve the automatically generated mappings:

curl -XGET localhost:9200/objects/_mapping?pretty

And the reply is as follows:

{
  "objects" : {
    "obj1" : {
      "properties" : {
        "field1" : {
          "type" : "long",
          "ignore_malformed" : false
        }
      }
    }
  }
}

No surprise here, we got what we expected (more or less). Now let's try something different—the second document with the same field name, but another value:

curl -XPUT localhost:9200/objects/obj1/2?pretty -d '{
 "field1" : "one hundred and seven"
}'

And the reply is as follows:

{
  "error" : "MapperParsingException[Failed to parse [field1]]; 
  nested: NumberFormatException[For input string: 
  \"one hundred and seven\"]; ",
  "status" : 400
}

It doesn't work. ElasticSearch assumes the field1 field as a number, and successive documents must fit into this assumption. To be sure, let's have one more try:

curl -XPUT localhost:9200/objects/obj1/2?pretty -d '{
 "field1" : 12.2
}'

Now that we have tried to index a document with a number, but a number of a different type, it succeeded. If we query for the mappings, we will notice that the type hasn't been changed. ElasticSearch silently changed our value and truncated the fractional part. It's not good, but this can happen when the input data is not so good (it usually isn't) and this is why we sometimes want to turn off automatic mapping generation. Another reason for turning it off is a situation when we don't want to add new fields to an existing index—fields that were not known during application development. To turn off automatic field adding, we can set the dynamic property to false, as follows:

{
  "objects" : {
    "obj1" : {
      "dynamic" : "false",
      "properties" : {
      ...
      }
    }
  }
}

Dynamic mappings

Sometimes we want to have the possibility of different type determination dependent on situations such as the field name and type defined in JSON. This is the situation in which dynamic templates can help. Dynamic templates are similar to the usual mappings. Each template has its pattern defined, which is applied to the document's field names. If a field matches the pattern, the template is used. The pattern can be defined in a few ways:

  • match: The template is used if the name of the field matches the pattern.

  • unmatch: The template is used if the name of the field doesn't match the pattern.

By default, the pattern is very simple and allows us to use the asterisk character. This can be changed by using match_pattern=regexp. After using this option, we can use all the magic provided by regular expressions.

There are variations such as path_match and path_unmatch that can be used to match the names in nested documents.

When writing a target field definition, the following variables can be used:

  • {name}: The name of the original field found in the input document

  • {dynamic_type}: The type determined from the original document

The last important bit of information is that ElasticSearch checks templates in order of their definitions and the first matching template is applied. This means that the most generic templates (for example, with "match": "*") should be defined at the end. Let's have a look at the following example:

{
  "mappings" : {
    "article" : {
      "dynamic_templates" : [
        {
          "template_test": {
            "match" : "*",
            "mapping" : {
              "type" : "multi_field",
              "fields" : {
                "{name}": { "type" : "{dynamic_type}"},
                "str": {"type" : "string"}
              }
            }
          }
        }
      ]
    }
  }
}

In the preceding example, we defined a mapping for the article type. In this mapping, we have only one dynamic template named template_test. This template is applied for every field in the input document because of the single asterisk pattern. Each field will be treated as a multi_field, consisting of a field named as the original field (for example, title) and the second field with the same name as the original field, suffixed with str (for example, title.str). The first of the created fields will have its type determined by ElasticSearch (with the {dynamic_type} type) and the second field will be a string (because of the string type).

Templates

As we have seen earlier in this chapter, the index configuration, and mappings in particular, can be complicated beasts. It would be very nice if there was a possibility of defining one or more mappings once and then using them in every newly created index, without the need to send them every time. ElasticSearch's creators predicted this and included a feature called index templates. Each template defines a pattern, which is compared to the newly created index name. When both match, the values defined in the template are copied to the index structure definition. When multiple templates match with the newly created index name, all of them are applied and values from the later applied templates override those defined in the previously applied templates. This is very convenient, because we can define a few common settings in the more general templates and change them into more specialized ones. Additionally, there is an order parameter, which lets us force desired template ordering. You can think of templates as dynamic mappings, which can be applied not to the types in documents, but to the indexes.

Let's see a real example of a template. Imagine that we want to create several indexes where we don't want to store the source of the documents so that the indexes will be smaller. We also don't need any replicas. The templates can be created by calling ElasticSearch REST API and an example cURL command would be similar to the following:

curl -XPUT http://localhost:9200/_template/main_template?pretty -d '
{
  "template" : "*",
  "order" : 1,
  "settings" : {
    "index.number_of_replicas" : 0
  },
  "mappings" : {
    "_default_" : {
      "_source" : {
        "enabled" : false
      }
    }
  }
}'

From now on, all created indexes will have no replicas and no source stored. Note the _default_ type name in our example. This is a special type name indicating that the current rule should be applied to every document type. The second interesting thing is the order parameter. Lets define the next template with the following command:

curl -XPUT http://localhost:9200/_template/ha_template?pretty -d '
{
  "template" : "ha_*",
  "order" : 10,
  "settings" : {
    "index.number_of_replicas" : 5
  }
}'

All new indexes will behave as before except the ones with the names beginning with ha_. In this case, both the templates are applied. First, the template with the lower order is used and then, the next template overwrites the replicas setting. So, these indexes will have five replicas and disabled source storage.

There is one more important thing about this example. If we try to create a document with five replicas and we have only a single node in the cluster, it will probably fail after some time and display a message similar to the following:

{
  "error" : "UnavailableShardsException[[ha_blog][2] [6] shardIt, 
  [1] active : Timeout waiting for [1m], request: index 
  {[ha_blog][article][1], source[\n{\n  \"priority\" : 1,\n  
  \"title\" : \"Test\"\n}]}]",
  "status" : 503
}

This is because ElasticSearch tries to create multiple copies of each of the shards of which the index is built, but this only makes sense when each of these copies can be placed on different server instances.

Storing templates in files

Templates can also be stored in files. By default, the files should be placed in the config/templates directory. For example, our ha_template should be placed in the config/templates/ha_template.json file and have the following contents:

{
  "ha_template" : {
    "template" : "ha_*",
    "order" : 10,
    "settings" : {
      "index.number_of_replicas" : 5
    }
  }
}

Note that the structure of the JSON is a little bit different and has the template name as the main object key. The second important thing is that the templates must be placed in every instance of ElasticSearch. Also, the templates defined in the files are not available with the REST API calls.

You have been reading a chapter from
ElasticSearch Server
Published in: Feb 2013
Publisher: Packt
ISBN-13: 9781849518444
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime