Managing a child document with a join field
In the previous recipe, we saw how it's possible to manage relationships between objects with the nested object type. The disadvantage of nested objects is their dependence on their parents. If you need to change the value of a nested object, you need to reindex the parent (this causes a potential performance overhead if the nested objects change too quickly). To solve this problem, Elasticsearch allows you to define child documents.
Getting ready
You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.
To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.
How to do it…
In the following example, we have two related objects: an Order and an Item.
Their UML representation is as follows:
The final mapping should merge the field definitions of both Order
and Item
, as well as use a special field (join_field
, in this example) that takes the parent/child relationship.
To use join_field
, follow these steps:
- First, we must define the mapping, as follows:
PUT test1/_mapping { "properties": { "join_field": { "type": "join", "relations": { "order": "item" } }, "id": { "type": "keyword" }, "date": { "type": "date" }, "customer_id": { "type": "keyword" }, "sent": { "type": "boolean" }, "name": { "type": "text" }, "quantity": { "type": "integer" }, "vat": { "type": "double" } } }
The preceding mapping is very similar to the one in the previous recipe.
- If we want to store the joined records, we will need to save the parent first and then the children, like so:
PUT test/_doc/1?refresh { "id": "1", "date": "2018-11-16T20:07:45Z", "customer_id": "100", "sent": true, "join_field": "order" } PUT test/_doc/c1?routing=1&refresh { "name": "tshirt", "quantity": 10, "price": 4.3, "vat": 8.5, "join_field": { "name": "item", "parent": "1" } }
The child item requires special management because we need to add routing
with the parent (1 in the preceding example). Furthermore, we need to specify the parent name and its ID in the object.
How it works…
Mapping, in the case of multiple item relationships in the same index, needs to be computed as the sum of all the other mapping fields.
The relationship between objects must be defined in join_field
.
There must only be a single join_field
for mapping; if you need to provide a lot of relationships, you can provide them in the relations
object.
The child document must be indexed in the same shard as the parent; so, when indexed, an extra parameter must be passed, which is routing
(we'll learn how to do this in the Indexing a document recipe in Chapter 3, Basic Operations).
A child document doesn't need to reindex the parent document when we want to change its values. Consequently, it's fast in terms of indexing, reindexing (updating), and deleting.
There's more...
In Elasticsearch, we have different ways to manage relationships between objects, as follows:
- Embedding with
type=object
: This is implicitly managed by Elasticsearch and it considers the embedding as part of the main document. It's fast, but you need to reindex the main document to change the value of the embedded object. - Nesting with
type=nested
: This allows you to accurately search and filter the parent by using nested queries on children. Everything works for the embedded object except for the query (you must use a nested query to search for them). - External children documents: Here, the children are the external document, with a
join_field
property to bind them to the parent. They must be indexed in the same shard as the parent. The join with the parent is a bit slower than the nested one. This is because the nested objects are in the same data block as the parent in the Lucene index and they are loaded with the parent; otherwise, the child document requires more read operations.
Choosing how to model the relationship between objects depends on your application scenario.
Tip
There is also another approach that can be used, but on big data documents, it creates poor performance – decoupling a join relationship. You do the join query in two steps: first, collect the ID of the children/other documents and then search for them in a field of their parent.
See also
Please refer to the Using the has_child query, Using the top_children query, and Using the has_parent query recipes of Chapter 6, Relationships and Geo Queries, for more details on child/parent queries.