Managing nested objects
There is a special type of embedded object called a nested object. This resolves a problem related to Lucene's indexing architecture, in which all the fields of embedded objects are viewed as a single object (technically speaking, they are flattened). During the search, in Lucene, it is not possible to distinguish between values and different embedded objects in the same multi-valued array.
If we consider the previous order example, it's not possible to distinguish an item's name and its quantity with the same query since Lucene puts them in the same Lucene document object. We need to index them in different documents and then join them. This entire trip is managed by nested objects and nested queries.
Getting ready
You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.
To execute the commands in this recipe, you can use any HTTP client, such as curl
(https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.
How to do it…
A nested object is defined as a standard object with the nested type.
Regarding the example in the Mapping an object recipe, we can change the type from object
to nested
, as follows:
PUT test/_mapping { "properties" : { "id" : {"type" : "keyword"}, "date" : {"type" : "date"}, "customer_id" : {"type" : "keyword"}, "sent" : {"type" : "boolean"}, "item" : {"type" : "nested", "properties" : { "name" : {"type" : "keyword"}, "quantity" : {"type" : "long"}, "price" : {"type" : "double"}, "vat" : {"type" : "double"} } } } }
How it works…
When a document is indexed, if an embedded object has been marked as nested
, it's extracted by the original document before being indexed in a new external document and saved in a special index position near the parent document.
In the preceding example, we reused the mapping from the Mapping an object recipe, but we changed the type of the item from object
to nested
. No other action must be taken to convert an embedded object into a nested one.
The nested objects are special Lucene documents that are saved in the same block of data as its parent – this approach allows for fast joining with the parent document.
Nested objects are not searchable with standard queries, only with nested ones. They are not shown in standard query results.
The lives of nested objects are related to their parents: deleting/updating a parent automatically deletes/updates all the nested children. Changing the parent means Elasticsearch will do the following:
- Mark old documents as deleted.
- Mark all nested documents as deleted.
- Index the new document version.
- Index all nested documents.
There's more...
Sometimes, you must propagate information about the nested objects to their parent or root objects. This is mainly to build simpler queries about the parents (such as terms queries without using nested ones). To achieve this, two special properties of nested objects must be used:
include_in_parent
: This makes it possible to automatically add the nested fields to the immediate parent.include_in_root
: This adds the nested object fields to the root object.
These settings add data redundancy, but they reduce the complexity of some queries, thus improving performance.
See also
- Nested objects require a special query to search for them – this will be discussed in the Using nested queries recipe of Chapter 6, Relationships and Geo Queries.
- The Managing a child document with a join field recipe shows another way to manage child/parent relationships between documents.