Controlling cardinality
In the previous chapter, we discussed the concept of data cardinality. To refresh your memory, cardinality refers to the measure of unique values in a dataset. We know that time series databases in general have difficulty managing high cardinality datasets without this resulting in severe impacts on query performance.
This cardinality issue isn’t even necessarily specific to time series databases – relational databases such as MySQL or PostgreSQL also need to take cardinality into account when selecting data. In relational databases, large tables are often partitioned to improve query performance. That’s not an option with Prometheus’s TSDB, especially since you could consider the data to already be partitioned since each block functions as an independent database. Consequently, the closest we can get to partitioning data based on something other than time is by sharding, as we discussed in the previous chapter.
However, even...