Defining index mapping
In Elasticsearch, mapping refers to the process of defining the schema or structure of an index. It defines how documents and their fields are stored and indexed within Elasticsearch. Mapping allows you to specify the data type of each field, such as text, a keyword, a numeric character, and a date, and configure various properties for each field, including indexing options and analyzers. By defining a mapping, you provide Elasticsearch with crucial information about the data you intend to index, enabling it to efficiently store, search, and analyze the documents.
Mapping plays a critical role in delivering precise search results, efficient data storage, and effective handling of different data types within Elasticsearch.
When no mapping is predefined, Elasticsearch attempts to dynamically infer data types and create the mapping; this is what has occurred with our movie dataset thus far.
In this recipe, we will apply an explicit mapping to the movies
index.
Getting ready
Make sure that you have completed the Updating data in Elasticsearch recipe.
All the command snippets for the Dev Tools in this recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#defining-index-mapping.
How to do it…
You can define mappings during index creation or update them in an existing index.
An important note on mappings
When updating the mapping of an existing index that already contains documents, the mapping of those existing documents will not change. The new mapping will only apply to documents indexed afterward.
In this recipe, you are going to create a new index with explicit mapping, and then re-index the data from the movie
index, assuming that you have already created that index beforehand:
- Head to Kibana | Dev Tools.
- Next, let’s check the mapping of the previously created index with the following command:
GET /movies/_mapping
You will get the results shown in the following figure. Note that, for readability, some fields were collapsed.
Figure 2.13 – The default mapping on the movies index
Let’s review what’s going on in the figure:
a. Examining the current mapping of the genre
field reveals a multi-field mapping technique. This approach allows a single field to be indexed in several ways to serve different purposes. For example, the genre
field is indexed both as a text
field for full-text search and as a keyword
field for sorting and aggregation. This dual approach to mapping the genre
field is actually beneficial and well-suited for its intended use cases.
b. Examining the release_year
field reveals that indexing it as a text
field is not optimal, since it represents numerical data, which could be beneficial for range queries, as well as other numeric-specific operations. Retaining the keyword mapping for this field is advantageous for sorting and aggregation purposes. To address this, applying an explicit mapping to treat release_year
appropriately as a numerical field is the next step.
c. There are two other fields that will require mapping adjustments – plot
and cast
. Given their nature, these fields should be indexed solely as text, considering it is unlikely there will be a need to sort or aggregate on these fields. However, this indexing strategy still allows for effective searching against them.
- Now, let’s create a new index with the correct explicit mapping for the
cast
,plot
, andrelease_year
fields:PUT movies-with-explicit-mapping { "mappings": { "properties": { "release_year": { "type": "short", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "cast": { "type": "text" }, "plot": { "type": "text" } } } }
- Next, reindex the original data in the new index so that the new mapping is applied:
POST /_reindex { "source": { "index": "movies" }, "dest": { "index": "movies-with-explicit-mapping" } }
- Check whether the new mapping has been applied to the new index:
GET movies-with-explicit-mapping/_mapping
Figure 2.14 shows the explicit mapping applied to the index:
Figure 2.14 – Explicit mapping
How it works...
Explicit mapping in Elasticsearch allows you to define the schema or mapping for your index explicitly. Instead of relying on dynamic mapping, which automatically detects and creates the mapping based on the first indexed document, explicit mapping gives you full control over the data types, field properties, and analysis settings for each field in your index, as shown in Figure 2.15:
Figure 2.15 – The field mapping options
There’s more…
Mapping is a key aspect of data modeling in Elasticsearch. Avoid relying on dynamic mapping and try, when possible, to explicitly define your mappings to have better control over the field types, properties, and analysis settings. This helps maintain consistency and avoids unexpected field mappings.
You should consider using multi-field mapping to index the same field in different ways, depending on the use cases. For instance, for a full-text search of a string field, text mapping is necessary. If the same string field is mostly used for aggregations, filtering, or sorting, then mapping it to a keyword field is more efficient. Also, consider using mapping limit settings to prevent a mapping explosion (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html). A situation where every new ingested document introduces new fields, as with dynamic mapping, can result in defining too many fields in an index. This can cause a mapping explosion. When each new field is continually added to the index mapping, it can grow excessively and lead to memory shortages and recovery challenges.
When it comes to mapping limit settings, there are several best practices to keep in mind. First, limit the number of field mappings to prevent documents from causing a mapping explosion. Second, limit the maximum depth of a field. Third, restrict the number of different nested mappings an index can have. Fourth, set a maximum for the count of nested JSON objects allowed in a single document, across all nested types. Finally, limit the maximum length of a field name. Keep in mind that setting higher limits can affect performance and cause memory problems.
For many years now, Elastic has been developing a specification called Elastic Common Schema (ECS) that provides a consistent and customizable way to structure data in Elasticsearch. Adopting this mapping has a lot of benefits (data correlation, reuse, and future-proofing, to name a few), and as a best practice, always refer to the ECS convention when you consider naming your fields. We will see more examples using ECS in the next chapters.
See also
- For a comprehensive introduction to Elasticsearch mapping, see the following: https://www.elastic.co/blog/found-elasticsearch-mapping-introduction
- For insights on preventing mapping explosion, read this informative blog article: https://www.elastic.co/blog/3-ways-to-prevent-mapping-explosion-in-elasticsearch