Archiving time-based data
While dealing with time-based data, it is often noticed that the most useful data is that of the present. This makes the old data less relevant for our purposes. So as time progresses, the relevancy of past data falls very rapidly and the data we indexed exists without being used in the clusters. This situation is not very resource friendly, as there would be much unused data stored for no or less purpose.
We can visualize different levels of archiving, as follows:
Keep the hottest index in the machines that have good hardware (shard filtering).
Run the optimized API on indices where writing is done.
Close indices that are not required for instant search.
Take a snapshot and archive older indices.
Finally, remove indices that are no longer required.
Shard filtering
With time-based data, recent indices are more frequently used or are more relevant. In other words, at a given time, the data flowing would use some specific index based on the day, week, or month. This index...