Before we jump into doc values, let's quickly refresh what an inverted index is and why it is needed. Let's says we have the following documents:
- Doc 1: Apple
- Doc2: Apple
- Doc3: Samsung
The inverted index for the preceding documents looks like the following:
Term | Doc ID |
Apple | 1, 2 |
Samsung | 3 |
To find all the products manufactured by Apple, we would simply use a match query as shown here:
{
"query": {
"match": {
"manufacturer": "Apple"
}
}
}
With the help of inverted index, we can quickly look up all the documents associated with term Apple. But if you want to sort or run the aggregation using the inverted index, we have to go through the entire terms list and collect the document IDs, which is practically not possible. To solve this problem, doc values are introduced. Doc values for the preceding...