Modeling time series data
Often, we have a need to store time series data in Elasticsearch. Typically, one would create a single index to hold all documents. This typical approach of one big index to hold all documents has its own limitations, especially for the following reasons:
- Scaling the index with an unpredictable volume over time
- Changing the mapping over time
- Automatically deleting older documents
Let's look at how each problem manifests itself when we choose a single monolithic index.
Scaling the index with unpredictable volume over time
One of the most difficult choices when creating an Elasticsearch cluster and its indices is deciding how many primary shards should be created and how many replica shards should be created.
Let's understand how the number of shards becomes important in the following sub sections:
- Unit of parallelism in Elasticsearch:
- The effect of the number of shards on the relevance score
- The effect of the number of shards on the accuracy of aggregations