Description
Problems
When querier selects series from a TSDB block using label matchers, it calls PostingsForMatchers method to fetch postings of each matcher and intersect/merge those postings to get a final expanded postings to be used to get series.
The code can be found here.
PostingsForMatchers can be very expensive in several cases:
- A query matches a large amount of postings. This can be either it has a lot of matchers or the matched posting has quite high cardinality. It will be time consuming not only to fetch postings, but also merge and intersect those postings.
- A query contains some bad
regexmatchers, which takes a large amount of CPU time to match every label values.
For use cases like having rules querying time range > 2h, for example avg_over_time(xxx{matchers="..."}[24h]), PostingsForMatchers will be executed over and over again for the same TSDB blocks.
Proposal
If we can introduce a local inmemory cache in TSDB to cache expanded postings, which is the results of PostingsForMatchers, a large amount of CPU cycles can be saved.
We can start by caching expanded postings for TSDB blocks on disk (not Head) because blocks are immutable.
If the cache is a single instance in the TSDB, the interface can be similar to what Thanos has here. By using block ID and matchers, we can uniquely identify one expanded posting stored in TSDB.
// StoreExpandedPostings stores expanded postings for a set of label matchers.
StoreExpandedPostings(blockID ulid.ULID, matchers []*labels.Matcher, v []byte)
// FetchExpandedPostings fetches expanded postings and returns cached data and a boolean value representing whether it is a cache hit or not.
FetchExpandedPostings(ctx context.Context, blockID ulid.ULID, matchers []*labels.Matcher) ([]byte, bool)