prometheus/prometheus

Idea: store scrape cache as a trie instead of a map.

Open

#16,513 建立於 2025年4月28日

在 GitHub 查看
 (8 留言) (0 反應) (0 負責人)Go (64,042 star) (10,408 fork)batch import
component/scrapinghelp wantedkind/enhancement

描述

Proposal

This could reduce memory usage.

Currently, both the series cache and the dropped series cache are implemented as Go maps with the scraped metrics string as the key. These strings are usually very similar for multiple lines.

To estimate the reduction in memory, let's look these lines from an example:

http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

The map keys are of length 47, 46, 46, 46, 44, 47, 33, 35 bytes. Total = 344 bytes (plus some overhead for string headers etc) If stored as a trie, we should expect the first 30 bytes to be stored once, then a suffix of bucket{le="0., and various other smaller suffixes. Should be under a hundred bytes of keys, although the trie data structure would have higher overheads.

Some exporters have far longer lines, e.g. these from kube-state-metrics are identical for over 100 bytes:

kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="true"} 1
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="false"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Progressing",status="unknown"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="true"} 1
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="false"} 0
kube_deployment_status_condition{namespace="redacted-namespace",deployment="redacted-deployment-name-very-interesting",condition="Available",status="unknown"} 0

A trie would be a bit slower to look up.

Previous work in this area:

  • #13050
  • #12443

貢獻者指南