描述
Proposal
This could reduce memory usage.
Currently, both the series cache and the dropped series cache are implemented as Go maps with the scraped metrics string as the key. These strings are usually very similar for multiple lines.
To estimate the reduction in memory, let's look these lines from an example:
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320
The map keys are of length 47, 46, 46, 46, 44, 47, 33, 35 bytes. Total = 344 bytes (plus some overhead for string headers etc)
If stored as a trie, we should expect the first 30 bytes to be stored once, then a suffix of bucket{le="0., and various other smaller suffixes. Should be under a hundred bytes of keys, although the trie data structure would have higher overheads.
Some exporters have far longer lines, e.g. these from kube-state-metrics are identical for over 100 bytes:
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="true"} 1
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="false"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="unknown"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="true"} 1
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="false"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="unknown"} 0
A trie would be a bit slower to look up.
Previous work in this area:
- #13050
- #12443